EV3 AND EV4 SPECIFICATION 
DC227 and DC228 


Revision/Update Information: Version 2.0 May 3, 1991 


The EV3 and EV4 chips are the first in a family of microprocessors that implement the ALPHA 
architecture. . | 


The information in this document is subject to change without notice and should not be construed 
as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no 
responsibility for any errors that may occur in this document. 


This specification does not describe any program or product which is currently available from 
Digital Equipment Corporation. Nor does Digital Equipment Corporation commit to implement this 
specification in any product or program. Digital Equipment Corporation makes no commitment 
that this document accurately describes any product it might ever make. 


Copyright (C) 1991 by Digital Equipment Corporation 


Digital Restricted Distribution 


Digital Equipment Corporation 


Contents 


Chapter 1 Introduction 


1.1 
1.2 
1.3 
1.4 
1.5 


SCOPE sacs sarin: easier: evo hea vacant Ae, ober Nee, ea ee eae eA ee es 1-1 
EV4 Chip Features ............... clap Gtk kaa he aes ced es, en ae eae een aoe 1—1 
V3: Chip Meatures.. ¢.c.is eee $20 G Re Ge OER act SS SEE MADRE ES Phigcpi alent & 1-2 
DONATIONS 66s we a oe eS BE oe Se CES Ce ees ei ard eee 1-2 
Terminology and Conventions ............. ccc cece eee eee eee e eet eee eee eee 1-3 
1.5.1 NUMDENING 3624 Sou wie eee baba dea es Boise a aeong ee ee ee re 1-3 
1.5.2 UNPREDICTABLE And UNDEFINED ........... 0... 02. ce eee eee ees 1-3 
1.5.3 Ranves And EXtents.45..0:6.3.@e-0- chk Sie i a hs A RE SS 1—4 
1.5.4 Must be Zero (MBZ)........... re re ee ere ere 14 
1.5.5 Should be Zero (SBA) o3 54.5 wht oe we 8 hee oS Re Ae RS Ree 1—4 
1.5.6 Read 28 Zero. (RAZ): «6.9.4: sn hehe oatp a aed BG RA eet PRES ARS 1—4 
hoy. TeroreClGN) 23.3 1205 6o eho s.s sede ilss RAs Sede ons eee 1-4 
1.5.8 Register Format Notation ........ 0.0.0. ee eee cee eens 1-4 


Chapter 2 EVx Micro-architecture 


2.F 
2.2 
2.3 


2.4 


2.5 


FALTOGUCHION 3 5.3.65. -G 5. Fb bah Ga RS eet a weed es eee eee 2-1 
Overview ....... seca fanaa a eas nce a en ek eon ee ee ae 2—1 
WOR oo otate sdk bien ee ee Sr Baht rds idee te BAA ora Ae Ghats aoe Se eu ae OREN eee See 2-2 
2.3.1 Branch: Prediction: Logie ¢ oe 33s 9 is ee he Sw Peo hee aes 2-2 
2.3.2 TBs ss ne Ft NGS ea5 Gs see Side the vs soto ea ats ot th oh a ede Ecahease Be, echoed Se eh See sehen 2-2 
2.3.3 Interrupt Logic.............. 00002 eee Se eaten Noche catecasan uals tewae ante altcaoeies are 2-3 
2.3.4 Performance Counters ........... 0.000 e cues Ee A ee ane ae a 2—4 
TODOX a3 bbe oS es SAS Sa i ee Ee ee ee eRe es 2—4 
PIDOX sank 228 wd oe ee es a OMe eS Se ew eh ats bee ks Low eee waned wea 2—4 
2.5.1 Dt Deh ee es beat ap Sek Giedadh case Areas ase nee ae eae ee 2—4 
2.5.2 Bs 2 le sec vhs eh eee Sa ha tsk ete Saw ied and meas ioe Ato eh eee ean 2-5 
2.5.3 EO AG SOS). 23.5. d: aetna ele gee ke ates ahah eed eA Bde dh tae 2-5 


Lil 


2.6 
2.7 


2.8 
2.9 


2.5.4 WV aCe DU CO Re 265 eb oe ests tata kg, ict Sena, ci a a heh Seah caegttaelg Ae ot Site altos 2—6 


2.5.4.1 BV2 Write: BUNCE sé 65. eades as. hh eds Fo a Os EE ee 2-6 

2.5.4.2 V4 Write Butler 25.555 s cheasd a bel tb Aes ee eh eee Pica eee 2~-7 
POX: 35.54.55 4k Swish eile yea tere as Rigi Gee es Se aie lee oh Se tgs ne ae ae 2-7 
Cache Oreanizavion: 4.4 ss oad ows Wa oo hh eS LE i Red DETR OSES ESL ARARS 2-8 
2.7.1 Data CAC Ne. ahaa Shh bese et eh eas se ao i ecb dis ie ate a le a ete ae Oe 2-8 
2.7.2 INStTUCliOn CACHE i... 6454 208.68 56-4 REO EEDA eee 2-8 
Pipeline Oreanizationy 32 s44.5. ope e Chie ee See CS ee See eee hes 2-9 
Scheduling and Issuing Rules ......... 0... 0. cece ee eee ee ee ee eee neee 2-11 
2.9.1 Instruction Class Definition ......... 0.0... cc ee ee ee ee eee 2-11 
2.9.2 | Producer-Consumer Latency Matrix......... 0.0... cece eee ee eee ees 2-12 
2.9.3 Producer-Producer Latency .......... 0.2: cece eee eee etnies 2-13 
2.9.4 Vx 1sstie: Rules onc oes a See hee eee ei ee deed whee ees 2-14 

2.9.4.1 EV3 Specific Issue Rules ........ Packiast gat lactones ip ails taseetaee cee eee Ba 2-14 

2.9.4.2 EV4 Specific Issue Rules .......... 0... ec ec eee tee eee 2-14 
2.9.5 Dia Tes eR S soca. gators wd ge oeew eee, Ste ack, ne a Gee es Bare 2-15 


Chapter 3 Privileged Architecture Library Code 


3.1 
3.2 
3.0 


3.4 
3.5 


3.6 
3.7 


3.8 


PGTOGUCLION oriee etc ae ee es dees A ek a so ee no Be Besa wees 3-1 
PAC SW IPFOMINOM Go. 655 Jobo catersg eine tone Sat sce rae Bis aoa te ea EE, Ma Bind Ao eed 3-1 
ppecial PAL Instructions: «.. 2.4 dae 4604s o-e5 0s SN i a a oe eae Ee Ee pice oie a 3—2 
3 as Be HW _ MPEPR and HW _ MTPR bie 6b id ie Bee owe oS Bae cow ew uk 3-3 
3.3.2 HW_LD and AW 290 cs oot abe wis cue atte ede ba Oh eee oh ees eS 3-6 
3.3.3 PEW SEB a seck oe Ot celta putea te alae ed Sid cen ae a ate ne ee a ec Ae 3-7 
PAL: Wntry POMS 4.4) 2:56 sao 5 oa ah SSS SS Sew SAO a Ra Sa ES 3—7 
General PALmode Restrictions... ........ 0.0... cc eee ee ee ee eee eee ee nee 3-9 
3.5.1 Vx PAIS REStrichions s-s:5 65 ds A oetk 4 EPG eS ans ed ee ook Pewee oa wanes 3-9 
3.5.2 EV3 Specific PALmode Restrictions .......... 0.0... 0c cece ee ee ee eee 3-12 
3.5.3 EV4 Specific PALmode Restrictions ..........0.. 000. cee eee ee eee 3-13 
| LO () se 8 eRe na ma aA a ee ee ne 3-13 
TE Miss FlOWS i303 4 aie nd OS SERA Se RAS od dehy Serle ta taeda os GO a 3-16 
3.7.1 PED WSS: tits, 5s he ae Ses be LR See Me nee oy Ge tee oe 3-16 
3.7.2 UED MISS 44.5 644 hewn sande iad winetaded Meaded week ee ee Tre 3-16 
TBOX TERS). oar cs aoe cee waa dace tlaa ie Aria dtd ese pe aseeeek tee 3-17 
3.8.1 TB VAG oe ices Sh Weare a Aa aise ies a A ws ate ee be ek oe 3-17 
3.8.2 Me tO oa 2c aha anette et tea as ak ee eda acetal atte aN eka eee oes SE cs ats 3-18 
3.8.3 DOCS cs gti Bahr bla wh ace ere oes es Sa SAN gunna ease Stoo eee es 3-18 

3.8.3.1 Pertormancé: Counters .i66-4:3-40 5 bx 366 38 64 Re SRE HERE ESS 3-20 
Sea. TPB COMP oat 5 oh ctu cuita Maeda den edandaiientaakes 3-22 
3.8.5 EXC_ADDR....... aceasta. ts, B goeeag, eke er Beads Sain os Ho aoe race a ae aoe ae 3-22 


3.8.6 SL_CLR 


3.9 


3.10 


3.8.7 Be UV ay eae ee beeen ee eae asl od Ss ad ig hg a ee eae 3-23 
3.8.8 Mi oases a ae, Se eae acd ES a ee nee ene A te 3-24 
3.8.9 PEP VES cra toch ors wae ag wae rs oie Ook adie Ge es de 3-24 
feo 9 AME 1 i 5 | [no ae eae mae ee Pe eer a 3-24 
lh My acme eld Se sias ies es SE es Se tt eh tae ei Oe we ee eae Sed bs See 3-24 
Osos. SRO SUM oe oet ost cee ae tee bd tees Saad Sandee eae eens 3-25 
S019 PAL BASU s.4 4400665 BAe ew be awe deans bd SS wee aes 3-26 
Boe. ZHU sce a1 oss 4.0si4b GE os ee Oe ae a BO eons Ge, 3—26 
Beet EUR hare nest ae eee a ees een eee oe ee ie 3-27 
Ss PN saetise as Sires sraghn Tes Bettas cheat icy bs rer dah a wie ae epee wed. at eco shas h oe vate % 3-28 
ec A Mm BO | Aa a ee ee nO a ne ee ee me ee ee ee 3-28 
eK LIES, erat cae oo, See covariates eA a ne ee see eee es Bee ee 3-29 
8.19. AOU soi bid ws Se ahs 4 ot he vs Oe Eade eae ees 3-30 
B20? OUePIVEL tke Be Sad alnc et ie ati ech Scecee Bs Dee Great ce: Ged. eee ee 3-30 
PRON ES aces face ss ech Be as. & ed te Ee a es et ae 3-31 
3.9.1 es secs ren tet a ee ee OS ee eh es 3-31 
3.9.2 DB od beat oboe tg ain ae he a eee Wd Soh ance ln Bate in ds gees 3-31 
3.9.3 DIB PIB.TEMP 24 244.5.4:4 55 Ue6.44558 WSR oe % OR EOS ER SESE SERRE 3-32 
3.9.4 IVI SC ru aca: iinet ete etcetera atten Cott td alc meet de a sta a ea 3-32 
3.9.5 WAS tiger a aad ee dae oe, oD oe eas Rei ee Se ee ed 3-33 
3.9.6 PDA ors, wale ce, ona asta sda Peking BH, Whe Gs ete ew oe ees hah as ee 3-33 
3.9.7 DE BAOWE 64caG dese he hassel oo Se ee ieee ok eo a WL acs ole A 3-33 
3.9.8 DT id ak Ge io ats ae pd ds wie ea A ER ha eee Ee 3-33 
3.9.9 PLUSHENC ck ones 6 ws ented O Ska RA ee OAS ieee eas ieee he 3-33 
39.10° -RLUSH IC ASME % oes ce easy Bc acre oh Oe ee ee se ee eas 3-33 
S91 ADOX CU. 6c ose 6a eke hw oe Re ORES RE EKG 4S ESS Re 3-34 
BOA2 Del MODE occa ces ee Us ee oa RS be eee RRR 3-35 
ak aie deal eee eee A no a es Ss ns oN Sa ee be 3—35 
Ji Bs Se 1 ones od Ue eR rn re ee a eS Se eee ee 3-35 
Bo “BIDIOMS 64 oe od ne dah ews CAAA ee eee eee eae 3-36 
PG VE S529 Fi ie Se a ee as Se ais Be a ee ad a Bee Ree 3-38 
SO: “OC A ae sue, as woes Ge ae a ae ie Ses odes States wie ae 3-39 
3.10.22 DC_ADDR.............. Setawd asiets pe eoene de des Sahel ees ieee Scare tacts totes BN near 3—40 
O08 “BIC IAL 5 ot hue 3a bas Bed eed Serta ate Sie a ew OES BS eee 3-41 
3.10.4 BIU_ADDR ....... Based ds arta este id stn ah ad i ie Aes Hee ae as Ea Wied arate ele dead sah ges 3—42 
5-105) *“PUGGADDR ¢61:6.4420s eu Malek eae oa oo Se he lw Chae eas 3—43 
3.10.6 FILL_SYNDROME ........ sete Gees Bes etek ee re err 3—44 
BAOT BCATAG ws :46434 oie owes cM ata Gd Bae WR apse te apse tee ead a ae ee la eae 3—45 
Sel. -BCC Brror Correction: ¢-i:4/28 sce yet hed ed ak Rg SRA Gs Ate ete ERR SSR DS 3-47 
BUELOR LOW. Seo: oie sh airs bas d care. 8 och GA arc a eho mae Be eee te de ea ace a 3—48 


3.12 


4.1 
4.2 


Sek AVS error FlOWS: 6 s24 88 4c bac he eae ola Zoe Meee S 3—48 
S121. Jestréam BCC error oc pevikaiw bs dead ooh Seek we 4644s 3-48 
3.12.1.2 D-stream ECC error................. tn ies a ae es ee 3—48 
3.12.1.3 BIU: tag address parity error....... ee ee ee ee re ree aor 3-48 
3.12.1.4  BIU: tag control parity error ........ 0... 0. cee ee ens 3-49 
3.12.15 | BIU: system transaction terminated with CACK_SERR .......... 3-49 
3.12.1.6 BIU: system transaction terminated with CACK_HERR.......... 3-49 
3.12.1.7 BIU: I-stream parity error (parity mode only) .................. 3—49 
3.12.1.8 | BIU: D-stream parity error (parity mode only) ................. 3—50 

O22 sVERITOR PlOWS 444.554.4008. oh so eA ere ee ae ES Oak 3—51 
O.l22:1. ‘stream. DCC error’ sc: 3.065546 F665 oS e 4 A es 3-51 
3.12.2.2 D-stream ECC error........... a te a er ae eee eee are eee 3-51 
3.12.2.3  BIU: tag address parity error........... 0... ccc eee eee ee nee 3-51 
3.12.2.4 BIU: tag control parity error ........... 0... cece ee ee eee 3-52 
3.12.2.5 BIU: system external transaction terminated with CACK_SERR ... 3-52 
3.12.2.6 | BIU: system transaction terminated with CACK_HERR.......... 3-52 
3.12.2.7 | BIU: I-stream parity error (parity mode only) .................. 3-52 
3.12.2.8 | BIU: D-stream parity error (parity mode only) ................. 3-52 

Chapter 4 External Interface 

VCE CW aera eesti nam Sas eh Baa teen See Gn a, bed, Oe Be ee 4—1 

POI TIUSs, 2 ao say es Senta eas pra eee ee od aa ee a a eas ee Rees 4—1 

OD. “NNOCR Ss artery, cast p ia ss, ars ge ta Ode Sea oe ee ve ol eee a 4-3 

4.2.2 DCO K and. Reset: i 4.00 5.050 fb ace ao Rae Eo ee ee hws 4—4 

4.2.3 Initialization and Diagnostic Interface................. aca Bees ec deme ae 4—5 

ADA | AOGLESS BUS ieee a5s.tie ioe tie i ee 2 Re Re Oa es 4—7 

O7.0- > Data: DUS sh ded a)kcted Bene RSE e A OES ee ee 4 ee bea ee ee 4—7 

4.2.6 External Cache Controls 0's. 56:4 Visa ee Sd be BR Oe Ee Mw SE ES 4-8 
4.2.6.1 ne TAgAdt. RAM ogc. avd i bib oO Ss SOS ROR ae be BW a ae BO 4—9 
4.2.6.2 The VaeCtl RAMI 6.3, i. ue aus betes Ob eee eo ee eee ae 4-9 
4.2.6.3 The DataRAM...............ccc eee ce eee eeee Serre 4-10 
4.2.6.4 ACID 55.28 Fest ce oneness he eS ee a Se 4—10 
4.2.6.5 External Cache Access .......... 0. cece eee ee eee eee eens 4—11 
4.2.6.5.1 HoldReq and HoldAck ............ 0.0... 0. cc ee ee ee eee 4—11 
ADO Oe. ~“LagOks 4 eee se Bho Aare Sd Aas EG Ree ow a oh 412 

4.2.7 Extermal Cycle Conteol..4 466054644 bee See hapa eee ee eee 4—12 

4.2.8 Primary Cache Invalidate ....... 0.0... ccc eee eee ee ee eee eens 4—16 

42.9 PEL LOTEUI DUS fovea saat te, ahh ts te Sw ee es sea 4—16 

4.2.10 Electrical Level Configuration ......... 0.0... cc ccc ee eee eens 4-17 

4.2.11 Performance Monitoring........... 0... ccc cece ee eee te eens 4—17 

AsZ 2. “TASUAUE. ioceattcca sca hata Se ee psc ase Se ee 4—17 


UL 


4.3 
4.4 


o.1 


5.2 


5.3 


6.1 
6.2 
6.3 
6.4 
6.5 


6.6 
6.7 
6.8 
6.9 


AQAS: *COMMNUILY oh wae Ma SoS 2a ORs or a oe RE Pete 4—17 


Fe Dil MOdC 5 gsc te ds oe bhi esaerw sooo ake och & eal b Bow Bos Bae ee Se ee ee 4—17 
ER ERED ACULONNS 5 oco 65-0 ss ee ase cee eg vcd sat we aes 4—18 
4.4.1 IRGSOL. Si wie eee Se ae hi a ee SS Se od Ge Os 4-18 
4.4.2 Fast External Cache Read Hit........ 0... 0c ccc ce ee ee ene 4—20 
4.4.3 Fast External Cache Write Hit ........ 0.0... ccc eee ee ee eee 4—20 
4.4.4 READ_BLOCK Transaction .............0. ccc cece cece cece esc eeee 4—21 
4.4.5 Write Block ..........22ec000+ itis ane oes ta Rare carbs He Sade eee we See 4—22 
4.46 |LDxL Transaction ........... Speen Peo a eee ere eras aie eee ween 4-23 
4.4.7 STxC Transaction. ..........0.0ccccewes fete A Mita! Si Being 0 2 aS sak care ia Tag aN 4—23 
4.4.8 BARRING Tran SACtION: 660s oe 3 oo Ss ee OD ae ks RG eo ee 4—23 
4.4.9 PET CH Tran saCi0n oo: ic.3 4.9 eit 3d gra ew band Suenos Re eda & oe eee hea 4—24 
4.4.10 FETCHM Transaction ........... 0... 0. ccc ewe ee eee eee eee et eees 4—24 
Chapter 5 DC Characteristics 
OV OIVIOW: i355. 9025 d BH a ae ea ee ES eae seeing ee es 5—1 
5.1.1 POWEr OUDDIY 238666 6606 os we Aw EE wa Rae 5-1 
5.1.2 MRELETO NCE OU DDI Y <5. 424-5, Hoes eho dy wed See oats tt ne Seine GLa she acme, Bese awe 5—1 
5.1.3 INDUC ClOCKS ss 4:6 oe ho a RS OW SS eas 5—1 
5.1.4 Signal pins .............002 cee ae aststscty edi an et Sass aea ook aad tata as Doe aati ay aces Be 5-2 
BEE 100K: MoG6.a 5-5 <5.0d 65. an ao Beene 4a A AR eS Rees 5-3 
5.2.1 POWCT GUDDIV 625: i6 oso ie 2 eel we wd Fh wae Bee ee Saw 5-3 
B22 ~ ARelErence CUDDLY 2s 035s ee eo wb ee he ee eee eae sew es 5-3 
5.2.3 TODUTS: ebin do. cae ewes PUR Eee ee re rer ee eee ee Sk ae ee 5-3 
5.2.4 OUGDUUS 208 66k ee eR ad ee a asivace hee ht onde See 5-3 
5.2.5 BIGireChON eS <.vicced acct aca wae ee Bete ae ae Ged SM eae ee ws 5-3 
POW DISSIDAUION s-4:04.4:2 ded oo ch he FA APRESS ARERR SARL tree he Fis SES 5—4 
Chapter 6 AC Characteristics 
W ROL host hh oes 4G ad, Se aidan hes Sek Ad ee a EE Se ah Ce aoe ea eee ES 6—1 
INDUL ClOCKS © see s-4 Shs Bik eS ea anew eR ale DACs Le eee oy oe eae wes 6—1 
CpuC ROUEN ic si.5. det oa beeen Secs St ecokie 4 9&5 Bares nin iS GeO ae & we GD Be ee eee 6-2 
Test Configuration ....... Pe eee ee een ee ye ee eee gee ee ee ery ae 6-3 
Fast: Cycles0n-Extermal Cache 5:4 4:4. ch cce& wie dG Oh bale sb Mew eS ew 6-3 
6.5.1 Past NeAG CVClCS 6.245342 She eee bees eee eee es 6-3 
6.5.2 Fast: Write CY C108 ode) ehh nesta sie a. Fwd a on kOe ee dd Oe oe a ee ee 6—4. 
Nxteral-CyeleS:.cc + 2x woe oa oa ea ws eh DLS haw eek Os Eee helene 6—4 
CAO ewe ne he ene a a a Se ek a ee ee ew ae ene eee 6-5 
CACO he eek. ih ecard Se ade hh cde Gato ede ee eA a eh eel Gree ed Sa 6-6 
Tester Considerations « ..6..:6-4. 004 6 iso ese sds Ob wa 6 hes OO wR eee oS Oe 6-6 


6.9.1 Asynchronous Inputs: osc cd0-5e-6 hk Gb NE AM OEW BH o OSA TATA es 6—6 
6.9.2 Signals Timed from Cpu Clock.......... 0.0... cece ee ee eee nae ade ee 6-7 
6:10: “Scaling fOr Ve 55228 ee A Ae BSA SE ee he ek 6 as 6—7 
Chapter 7 Package | 
Chapter 8 Pinout 
rot 33 =) 0-0, ae a a a a ee eee 8—1 
Si2- ‘Change History. ones s+ td coetoene ened beads hated Gath away Shed ted wha eee 8—1 
Sco IV A PINON, ee iyt ras eas ate a ee RE Ee eo ee es a 8-2 
8.4 BV4/NVAX+ Pinout Differences .......... 0.0... cece eee ee eee te ee eas 8—11 
Appendix A EV3 and EV4 Chip Summary 
Figures 
t-1.. “PGA Cavity Up View ie.ck ewes eu eke ae Ok eee ee eee Ve a 7-2 
Tables - 
1 REVISION History ..... Sigh anc ik thc, SRE AO Ag de a tanta 5 Gea oe eo eee na ae 1 
1—1 Register Field Type Notation ......... 0... 0. ccc cee eect ee eee teens 1-5 
1-2 ~=— Register Field Notation ...... 2... .0. 0... ccc eee eee ee ee ee etna 1-5 
2-1 Producer-Consumer Classes. ....... 0... cc cece eee eee e eee eens 2-11 
2—2 Distal 1esue Rwles: nao ace k eae oe ee ee RG oe i POR EN es 2-15 
3-1 HW_MFPR and HW_MTPR Format Bescaaton ee ae Beiter aha cece ba tn lak 3-3 
3-2 PPE ACCOGS ieccchecee ie ace odo dah toe Sak he A, ond Rees a eee 3-3 
3-3 HW_LD and HW_ST Format Description ............. 0... 000 cece eee 3-6 
3—4 The HW_REI Format Description ................... 0c eee ee eee eee ee 3-7 
3-5 PAI Mntry Ponts 4 s.cvices 4d eb se as 8 os OR eh hh oe 6 Ree 3-8 
3-6 D-stream Error PAL Entry Points ......... 0.0.00... e ee eee eee eee ae 3-9 
3-7 Vs tPle Connicts: wci.osi ete 6b eda eee ee bee ew ee ee 3-12 
3-8 IPR Reset otate. 43.6644 Oi esto ha hare Pee eG al ee i 3-13 
Bo «= OC ORS hal 6.6 ea te aah een ee aie Sele se ganas Si Ee eee eet ek 3-19 
38-10 BHE,BPE Branch Prediction Selection ...............0ccceeceeeeeees 3-20 
3-11 Performance Counter 0 Input Selection ............ 0.0... 0c ee eee eee 3-21 
3-12 Performance Counter 1 Input Selection ............. 0.0.0.0... ce eee 3-22 
Mp Nes oo hs stam fe et cds, fo secant se nck etd ts ats arate cette ac pane erates 3-23 
Sot Mlammme 00, GSE | 08 | ang oe ne ee ee cee 3-26 
AL ICRG 2 coetette See th ita, Win es ee ae Gera ee a he ee ae a he oe 3-27 


Abox Control Register ... 0... 0... ccc ce ce ee eee eee eens 3-34 
PITT WMIOGC. 2555 satis ts 2s ES Gara oe ew ine he eee ae ee ha ee 3-35 
BIU Control Register ................ eae ender te taser cent een a el eh 3—36 
BC SIZE. 2234020545535 G0405G beeen einen Pasian es Cee ares 3-38 
BC PACDIS. 2 62%. ta ee bes eee eee ed cee ey wae oe ee eee 3-38 
Deache Status Register ..... 0.0... ccc cece ete ec eee ence eee eee 3—39 
Deache STAT Error Modifiers ............ 0... cc cece eee ec eee ees 3—40 
BIG SUA gc sicdutcoe haba dees os eR le ela Seg ce ter ve Si gie evant ioe eee eek 3—41 
Syndromes for Single-Bit Errors ........... 0... cece cece eee eee ee eee 3—44. 
BEV SIR AIS: pede aw & tk tee awe ara eee bn a oe ao le BS 4—1 
System Clock Divisor .......... 0.0. ccc ccc ec eee eee eee eee ees 4—5 
pystem: Clock Delay a s.iioc05 ose Be Nese ee oe ook wil ee SEs OB ae oes 4—5 
Icache Test Modes ............. ie athe @- sh cela tih ace eas soe apart Be ceeatea 4—5 
Tage Control. Bncodingsi:4...64 646s i bw hWS So Nis O25 TRESS READE BOERS 4—9 
OVCIO PDOs 446s eects ss es Se de ea ee eee ee se Pa 4—13 
Acknowledgment Types ........ 2.0.0.0 ec eee e eee ee eee tee ete eee ees 4—14 
Read Data Acknowledgment Types .............. 2. cece cece teen ceee 4—15 
VOU rains hea Gdn a arene wate baa. Gb ea Gon bee Be ee ao Ek BE TE 4—18 
TROSE Un be ene Sees eae asd cade hc fear 0k es GA ee ek ean 4—19 
CMOS DC Cnarattenisties a0va-seh sae boa Saw eee ee dae nega eats 5-2 
EV4 Power Dissipation @Vdd=3.45V ..... 0.0... . 0 ccc ee eee eee eee 5—4. 
Input: Clock: (imine 26.6. ooo ch. 5 aes Bee eR ee OG Ae RE Le 6-2 
BXCEIMAl CYCIOS, 2 foto eb A Oe ee AES a ah ae Ce eee aes 6—5 
PAO aie Gideen ti Aide hiss die eet dee ba a eae ee ere ara rererra 6-6 
Asynchronous Signals on a Tester ........... 0... cc eee eee eeee ere ey, 
EV3 Chip Summary and Micro-architecture .......... 00.0000 cece A-2 
EV4 Chip Summary and Micro-architecture ............. ere are A-3 


ix 


Table 1: REVISION History 


Revision Date Description 
1.0 19-May-1990 Initial Release 
2.0 May 3, 1991 All EV3 and EV4 issues are closed 


Chapter 1 
Introduction 


1.1 Scope 


This document describes the EV3 and EV4 chips, a family of microprocessors that implement 
the ALPHA architecture. This specification describes the external interface and programming 
information specific to the actual implementation. It does not describe the detailed imple- 
mentation of the chip nor the ALPHA architecture. The reader is referred to the ALPHA 
system reference manual for the architectural specification. 


1.2 EV4 Chip Features 


The EV4 microprocessor is a CMOS-4 (.75 micron) super-scalar super-pipelined implementa- 
tion of the ALPHA architecture. It will become the basis of the first family of ALPHA products. 
The EV4 chip is designed to meet the requirements of a wide variety of systems, ranging from 
uni-processor workstations to midrange multiprocessors. To achieve this goal, EV4 enforces 
as little policy as possible, e.g. it does not enforce a particular cache coherence scheme. EV4 
attempts to spread fairly the design compromises over the range of customers’ requirements. 
The design balances the cost goals of the low-end workstation with performance goals of the 
mid-range multiprocessors. 


EV4 features: 


¢ ALPHA instructions to support byte, word, longword, quadword, DEC F_floating, G_ 
floating and IEEE S_floating and T_floating data types. Limited support is provided 
for DEC D_floating operations. It implements the architecturally optional instructions: 
FETCH and FETCH_M. 


Demand paged memory management unit which in conjunction with properly written 
PALcode fully implements the ALPHA memory management architecture. The transla- 


tion buffer can be used with alternative PALcode to Be a different page table 
structure. 


¢ On-chip 8-entry I-stream TB and 32-entry D-stream TB which each map 8Kbyte pages, 
and a four-entry D-stream TB which maps aligned groups of 512 8Kbyte pages. 


¢ World class performance. At its nominal frequency EV4 achieves a 6.6ns cycle time. Cycle 
times of 5ns are possible by binning the parts. 
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Low average cycles per instructions (CPI). The EV4 chip can issue two ALPHA instruc- 
tions in a single cycle, thereby minimizing the average CPI. Branch history tables are 
also used to minimize the branch latency, further reducing the average CPI. 


On-chip high-throughput floating point unit, capable of executing both DEC and IEEE 
floating point data types. 


On-chip 8Kbyte data cache and an 8Kbyte physical instruction cache with ASN support. 
On-chip write buffer with four 32-byte entries. 
On-chip performance counters to measure and analyze cpu and system performance. 


Bus interface unit, which contains logic to directly access external cache RAMs without 
CPU module action. The size and access time of the external cache is programmable. 


An instruction cache diagnostic interface to support chip and module level testing. 


An internal clock generator which generates both a high-speed clock needed by the chip 
itself, and a pair of system clocks for use by the CPU module. 


The EV4 chip is packaged in 431 pin (24 x 24, 100 mil pin pitch) PGA packages. The heat 
sinks are seperable and application specific. 


1.3 EV3 Chip Features 


The EV3 microprocessor is an early variant of EV4 fabricated in CMOS-3 (1 micron). It is 
intended to be used during system-level debug of the first ALPHA products and will be used 
by the ALPHA Demonstration Unit. It is pin compatible with EV4, so no significant system- 
level changes are needed to transition from EV3 to EV4. Because it is fabricated in less dense 
technology, it has less functionality and a slower cycle time than EV4. 


The primary differences between EV3 and EV4 are: 


The nominal cycle time for EV3 is extended from to 6.6ns to 10ns. The external interface 
is designed such that running the CPU with a reduced cycle time does not require that 
all of the logic surrounding the CPU run at reduced speed. 


EV3 does not provide an on-chip floating point unit. Floating point instructions may be 
trapped for emulation if desired. 


EV3 primary caches are smaller. The Icache and Dcache are both 1Kbytes. 
EV3 uses a simpler branch prediction algorithm, no branch history table. 


Performance counters are not included in EV3. 


1.4 Definitions 


This document is the specification for both the EV3 and EV4 chips. Because the bulk of the 
functionality is the same for both chips, the remainder of the spec will use the term EVx 
to represent both EV3 and EV4 in discussions of features which are common to both chips. 


Discussions of features which are unique to.a particular chip will use the name of that chip 
(EV3 or EV4). 
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1.5 Terminology and Conventions 
1.5.1 Numbering 


All numbers are decimal unless otherwise indicated. Where there is ambiguity, numbers other 
than decimal are indicated with the name of the base following the number in parentheses, 
e.g., FF(hex). 


1.5.2 UNPREDICTABLE And UNDEFINED 


Throughout this specification, the terms UNPREDICTABLE and UNDEFINED are used. 
Their meanings are quite different and must be carefully distinguished. One key difference 
is that only privileged software (that is, software running in kernel mode) may trigger 
UNDEFINED operations, whereas either privileged or unprivileged software may trigger 
UNPREDICTABLE results or occurrences. A second key difference is that UNPREDICTABLE 
results and occurrences do not disrupt the basic operation of the processor; the processor 
continues to execute instructions in its normal manner. In contrast, UNDEFINED operation 
may halt the processor or cause it to lose information. 


A result specified as UNPREDICTABLE may acquire an arbitrary value subject to a few 
constraints. Such a result may be an arbitrary function of the input operands or of any state 
information that.is accessible to the process in its current access mode. UNPREDICTABLE 
results may be unchanged from their previous values. UNPREDICTABLE results must not 
be security holes. Specifically, UNPREDICTABLE results must not do any of the following: 


¢ Depend on or be a function of the contents of memory locations or registers which are 
inaccessible to the current process in the current access mode. | 


e Write or modify the contents of memory locations or registers to which the current process 
in the current access mode does not have access. 


e Halt or hang the system or any of its components. 


For example, a security hole would exist if some UNPREDICTABLE result depended on the 
value of a register in another process, on the contents of processor temporary registers left 
behind by some previously running process, or on a sequence of actions of different processes. 


An occurrence specified as UNPREDICTABLE may happen or not based on an arbitrary choice 
function. The choice function is subject to the same constraints as are UNPREDICTABLE 
results and, in particular, must not constitute a security hole. 


Results or occurrences specified as UNPREDICTABLE may vary from moment to moment, 
implementation to implementation, and instruction to instruction within implementations. 
Software can never depend on results specified as UNPREDICTABLE. 


Operations specified as UNDEFINED may vary from moment to moment, implementation to 
implementation, and instruction to instruction within implementations. The operation may 
vary in effect from nothing, to stopping system operation. UNDEFINED operations must not 
cause the processor to hang, i.e., reach an unhalted state from which there is no transition 
to a normal state in which the machine executes instructions. Only privileged software (that 
is, software running in kernel mode) may trigger UNDEFINED operations. 
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1.5.3 Ranges And Extents 


Ranges are specified by a pair of numbers separated by a ".." and are inclusive, e.g., a range 
of integers 0..4 includes the integers 0, 1, 2, 3, and 4. 


Extents are specified by a pair of numbers in angle brackets separated by a colon and are 
inclusive, e.g., bits <7:3> specify an extent of bits including bits 7, 6, 5, 4, and 3. 


1.5.4 Must be Zero (MBZ) 


Fields specified as Must Be Zero (MBZ) must never be filled by software with a non-zero 
value. If the processor encounters a non-zero value in a field specified as MBZ, a Reserved 
Operand exception occurs. 


1.5.5 Should be Zero (SBZ) 


Fields specified as Should Be Zero (SBZ) should be filled by software with a zero value. 
These fields may be used at some future time. Non-zero values in SBZ fields produce 
UNPREDICTABLE results. 


1.5.6 Read As Zero (RAZ) 
Fields specified as Read As Zero (RAZ) return a zero when read. 


1.5.7 Ignore (IGN) 
Fields specified as Ignore (IGN) are ignored when written. 


1.5.8 Register Format Notation 


This specification contains a number of figures that show the format of various registers, 
followed by a description of each field. In general, the fields on the register are labeled with 
either a name or a mnemonic. The description of each field includes the name or mnemonic, 
the bit extent, and the type. 


The “Type” column in the field description includes both the actual type of the field, and 
an optional initialized value, separated from the type by a comma. The type denotes the 
functional operation of the field, and may be one of the values shown in Table 1-1. If present, 
the initialized value indicates that the field is initialized by hardware to the specified value 
at powerup. If the initialized value is not present, the field is not initialized at powerup. 
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Table 1-1: 


Notation 
RW 
RO 


WO 


WZ 


W1C 


WO0C 


WA 


RC 


Register Field Type Notation 


Description 
A read-write bit or field. The value may be read and written by software. 


A read-only bit or field. The value may be read by software. It is written by 
hardware; software writes are ignored. 


A write-only bit or field. The value may be written by software. It is used by 
hardware and reads by software return an UNPREDICTABLE result. 


A write bit or field. The value may be written by software. It is used by hardware 
and reads by software return a 0. 


A write-one-to-clear bit. If reads are allowed to the register then the value may 
be read by software. If it is a write-only register then a read by software returns 
an UNPREDICTABLE result. Software writes of a 1 cause the bit to be cleared by 
hardware. Software writes of a 0 do not modify the state of the bit. 


A write-zero-to-clear bit. If reads are allowed to the register then the value may 
be read by software. If it is a write-only register then a read by software returns 
an UNPREDICTABLE result. Software writes of a 0 cause the bit to be cleared by 
hardware. Software writes of a 1 do not modify the state of the bit. 


A write-anything-to-the-register-to-clear bit. If reads are allowed to the register 
then the value may be read by software. If it is a write-only register then a read 
by software returns an UNPREDICTABLE result. Software write of any value to 
the register cause the bit to be cleared by hardware. 


A read-to-clear field. The value is written by hardware and remains unchanged 
until read. The value may be read by software at which point, hardware may write 
a new value into the field. . 


In addition to named fields in registers, other bits of the register may be labeled with one of 
the three symbols listed in Table 1-2. These symbols denote the type of the unnamed fields 


in the register. 


Table 1-2: Register Field Notation 


Notation 
RAZ 
IGN 


Description 
Denotes a register bit(s) that is read as a 0. 


Denotes a register bit(s) that is ignored on write and UNPREDICTABLE when 
read if not otherwise specified. 
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Chapter 2 
EVx Micro-architecture 


2.1 Introduction 


This chapter gives a programmer and system designer view of the EVx micro-architecture. 

It is not intended to be a detailed hardware description of chip. The reader is referred to 
the behavioral model for an accurate and highly detailed specification of the chip. Describing 
the micro-architecture of a heavily pipelined machine is always problematic. To understand 
the hardware you need to understand the pipeline, but it is very difficult to describe the 
pipeline without a hardware description. This spec first describes the hardware with only 
minimal forward references to the pipeline and then presents the pipeline. EVx can issue 
two instructions in a single cycle - the scheduling and dual issue rules are defined at the end 
of the chapter. 


It is important to realize that the combination of EVx and PALcode implements the ALPHA 
architecture. Many hardware design decisions were based on specific PAL functionality. These 
PAL assumptions and restrictions are detailed in the next chapter. The important point to 
keep in mind is that if a certain piece of hardware appears to be "architecturally incomplete’, 
the missing functionality is implemented in PALcode. 


2.2 Overview 


The EV4 microprocessor consists of three independent execution units: integer execution unit 
(Ebox), floating point unit (Fbox), and the address generation, memory management, write 
buffer and bus interface unit (Abox). EV3 does not contain a floating point unit. Each unit 
can accept at most one instruction per cycle, however if code is properly scheduled, EVx can 
issue two instructions to two independent units in a single cycle. A fourth unit, the Ibox, 
is the central control unit. It issues instructions, maintains the pipeline and performs all of 
the PC calculations. EVx also has on-chip instruction and data caches (Icache and Deache). 
The major functional difference between EV4 and EV3 is that EV3 does not include a floating 
point unit. 
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2.3 Ibox 


The primary function of the Ibox is to issue instructions to the Ebox, Abox and Fbox. In order 
to provide those instructions, the Ibox also contains the prefetcher, PC pipeline, ITB, abort 
logic, register conflict or dirty logic, and exception logic. The Ibox decodes two instructions 
in parallel and checks that the required resources are available for both instructions. If 
resources are available and dual issue is possible then both instructions may be issued. The 
section on Dual Issue Rules details which instructions can be dual issued. If the resources 
are available for only the first instruction or the instructions cannot be dual issued then the 
Ibox issues only the first instruction. The Ibox does NOT issue instructions out of order, even 
if the resources are available for the second instruction and not for the first instruction. The 
Ibox does not issue instructions until the resources for the first instruction become available. 
If only the first of a pair of instructions issues, the Ibox does not advance another instruction 
to attempt to dual issue again. Dual issue is only attempted on aligned quadword pairs. 


2.3.1 Branch Prediction Logic 


The Ibox contains the branch prediction logic. EV4 offers a choice of branch prediction 
strategies selectable through the ICCSR IPR. The Icache records the outcome of branch 
instructions in a single history bit provided for each instruction location in the cache. This 
information can be used as the prediction for the next execution of the branch instruction. 
The prediction for the first execution of a branch instruction is based on the sign of the 
displacement field within the branch instruction itself. If the sign bit is negative, conditional 
branches are predicted to be taken. If the sign is positive, conditional branches are predicted 
to be not taken. Alternatively, if the history table is disabled, branches can be predicted based 
on the sign of the displacement field at all times. 


The EV3 chip provides only sign of the displacement branch prediction. © 


Both chips provide a 4-entry subroutine return stack which is controlled by the hint bits in the 
BSR, HW_REI, and jump to subroutine instructions (JMP, JSR, RET, or JSR_COROUTINE). 
Both chips also provide a means of disabling all branch prediction hardware. 


2.3.2 ITB 


The Ibox contains an 8-entry fully associative translation buffer to cache recently used 
instruction-stream address translations and protection information for 8Kbyte pages. The 
ITB uses a not-last-used replacement algorithm. The ITB is filled and maintained by PALcode. 
Unlike the DTB, it is not possible to write to the ITB in native(non-PAL) mode. The chapter 
on PALcode details the ITB miss flow. 


While not executing in PAL mode, the 43-bit virtual program counter (VPC) is presented each 
cycle to the ITB. If the PTE associated with the VPC is cached in the ITB then the PFN and 
protection bits for the page which contains the VPC are used by the Ibox to complete the 
address translation and access checks. 


The EVx ITB supports one ASN, the PTE[ASM] bit. PALcode which supports writes to 
the architecturally-defined TBIAP register does so by using the hardware-specific HW_ 
MTPR instruction to write to the hardware-specific ITBASM register. This has the effect 
of invalidating ITB entries which do not have their corresponding ASM bits set. 
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2.3.3 Interrupt Logic 


The EVx chip supports three sources of interrupts; hardware, software and asynchronous 
system trap (AST). There are six level-sensitive hardware interrupts sourced by pins, 15 
software interrupts sourced by an on-chip IPR (SIRR), and 4 AST interrupts sourced by a 
second internal IPR (ASTRR). All interrupts are independently maskable via on-chip enable 
registers to support a software controlled mechanism for prioritization. In addition, AST 
interrupts are qualified by the current processor mode. The EV4 chip further qualifies AST 
interrupts with the current state of SIER[2]. EV3 supports this function in PAL code. All 
interrupts are disabled when the processor is executing PALcode. 


By providing distinct enable bits for each independent interrupt source, a software controlled 
interrupt priority scheme can be implemented with maximum flexibility. For example, a six 
level interrupt priority scheme can be supported for the six hardware interrupt request pins 
by defining a distinct state of the corresponding hardware interrupt enable register for each 
IPL. The current interrupt priority is determined by the state of the interrupt enable register. 
The lowest interrupt priority level is produced by enabling all 6 interrupts, e.g bits 6-1. The 
next is produced by enabling bits 6-2 and so on to the highest interrupt priority level which 
is produced by enabling only bit 6 and disabling bits 5 through 1. When all interrupt enable 
bits are cleared, the processor can not be interrupted from the hardware interrupt request 
register. Each state, 6-1,6-2,6-3,6-4,6-5,6 represents an individual interrupt priority level 
(IPL). If these states are the only states allowed in the interrupt enable register, a six level 
hardware interrupt priority scheme can be controlled entirely by software. 


The scheme is extendible to provide multiple interrupt sources at the same interrupt priority 
level by grouping enable bits. Groups of enable bits must be set and cleared together to 
support multiple interrupts of equal priority level. Of course, this method reduces the total 
available number of distinct levels. 


Since enable bits are provided for all hardware, software and AST interrupt requests, a 
priority scheme can span all sources of processor interrupts. The only exception to this rule 
regards the restriction on AST interrupt requests as described below. 


Four AST interrupts are provided; one for each processor mode. AST interrupt requests are 
qualified such that AST requests corresponding to a given mode are blocked whenever the 
processor is in a higher mode regardless of the state of the AST interrupt enable register. 
In addition, all AST interrupt requests are qualified in EV4 with SIER[2] to disable AST 
requests when IPL is higher than 2. This function is provided in PALcode for EV3. 


When the processor receives an interrupt request and that request is enabled, an interrupt is 
reported or delivered to the exception logic if the processor is not currently executing PALcode. 
Before vectoring to the interrupt service PAL dispatch address, the pipeline is completely 
drained and all outstanding load instructions are completed. The restart address is saved 
in the Exception Address IPR (EXC_ADDR) and the processor enters PALmode. The cause 
of the interrupt may be determined by examining the state of any of the interrupt request 
registers. 


Note that hardware interrupt requests are level sensitive and therefore may be removed 
before an interrupt is serviced. If they are removed before the interrupt request register is 
read, the register will return a zero value. 
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2.3.4 Performance Counters 


The EV4 chip contains a performance recording feature. The implementation of this feature 
provides a mechanism to count various hardware events and cause an interrupt upon counter 
overflow. Interrupts are triggered six cycles after the event, and therefore, the exception PC 
may not reflect the exact instruction causing counter overflow. Two counters are provided to 
allow accurate comparison of two variables under a potentially non-repeatable experimental 
condition. Counter inputs include issues, non-issues, total cycles, pipe dry, pipe freeze, 
mispredicts and cache misses as well as counts for various instruction classifications. In 
addition, one chip pin input to each counter is provided to measure external events at a rate 
determined by the selected system clock speed. Performance counters are not present in EV3. 


2.4 Ebox 


The Ebox contains the 64-bit integer execution datapath: adder, logic box, barrel shifter, byte 
zapper, bypassers and integer multiplier. The integer multiplier retires 4 bits per cycle. The 
Ebox also contains the 32-entry 64-bit integer register file. The register file has four read 
ports and two write ports which allow the sourcing (sinking) of operands (results) to both the 
integer execution datapath and the Abox. _ 


2.5 Abox 


The Abox contains six major sections: address translation datapath, load silo, write buffer, 
Deache interface, [PRs and the external bus interface unit (BIU). The address translation 
datapath has a displacement adder which generates the effective virtual address for load and 
eit instructions, and a pair of translation buffers which generate the corresponding physical 
address. 


2.5.1 DTB 


EVx contains a 32-entry fully associative translation buffer which caches recently used data- 
stream page table entries for 8Kbyte pages, and a four-entry fully associative translation 
buffer which supports the largest granularity hint option (512*8Kbyte pages) as described in 
the ALPHA SRM. Both translation buffers use a not-last-used replacement algorithm. They 
are hereafter referred to as the small-page and large-page DTBs, respectively. PALcode is 
responsible for insuring that a particular PTE is never contained in both the small- and 
large-page DTBs at the same time. 


EVx supports a single address space number via the PTE[ASM] bit. PALcode which supports 
writes to the architecturally-defined TBIAP register does so by using the hardware-specific 
HW_MTPR instruction to write to the hardware-specific DTBASM register. This has the 
effect of invalidating DTB entries which do not have their corresponding ASM bit set. 


For load and store instructions, the effective 43-bit virtual address is presented to the DTBs. 
If the PTE of the supplied virtual address is cached in either DTB, the PFN and protection 
bits for the page which contains the address are used by the Abox to complete the address 
translation and access checks. 


The DTBs are filled and maintained by PALcode. The chapter on PALcode details the DTB 
miss flow. Note that the DTBs can be filled in kernel mode by setting the HWE bit in the 
ICCSR IPR. 
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2.5.2 BIU 


The BIU controls the interface to the EVx pin bus. It responds to three classes of CPU- 
generated requests: Deache fills, Icache fills and write buffer-sourced commands. The BIU 
resolves simultaneous internal requests using a fixed priority scheme in which Deache fill 
requests are given highest priority, followed by Icache fill requests. Write buffer requests 
have the lowest priority. The external interface chapter of this specification describes the 
EVx pin bus. 

The BIU contains logic to directly access an external cache to service internal cache fill 
requests and writes from the write buffer. The BIU services reads and writes which do not 
hit in the external cache with help from external logic. 


Internal data transfers between the CPU and the BIU are made via a 64-bit bidirectional 
bus. Since the internal cache fill block size is 32 bytes, cache fill operations result in four 
data transfers across this bus from the BIU to the appropriate cache. Also, since each write 
buffer entry is 32 bytes wide, write transactions may result in four data transfers from the 
write buffer to the BIU. 


2.5.3 Load Silos 


The Abox contains a fully folded memory reference pipeline which may accept a new load or 
store instruction every cycle until a Deache fill is required. Since the Dcache lines are only 
allocated on load misses, the Abox may accept a new instruction every cycle until a load miss 
occurs. When a load miss occurs the Ibox stops issuing all instructions that use the load port 
of the register file or are otherwise handled by the Abox (LDx, STx, MFPR, JSR, RCC, RS, 
RC), MB and SYNC instructions. A JSR with a destination of R31 may be issued. 


Since the result of each Deache lookup is known late in the pipeline (stage [6]) and instructions 
are issued in pipe stage [3], there may be two instructions in the Abox pipeline behind a load 
instruction which misses the Dcache. These two instructions are handled as follows: 


¢ Loads which hit the Deache are allowed to complete - hit under miss. 
¢ Load misses are placed in a silo and replayed in order after the first load miss completes. 


e Store instructions are presented to the Deache at their normal time with respect to the 
pipeline. They are silo’ed and presented to the write buffer in order with respect to load 
misses. | 


When a load miss occurs in EV3 the Ibox stops issuing Abox-directed instructions until all 
pending Deache fills are complete. This insures that no conflicts for the Dceache will occur. 


In order to improve performance in EV4, the Ibox is allowed to restart the execution of Abox- 
directed instructions before the last pending Dcache fill is complete. Dcache fill transactions 
result in four data transfers from the BIU to the Dcache. These transfers may each be 
separated by one or more cycles depending on the characteristics of the external cache and 
memory subsystems. The BIU attempts to send the quadword of the fill block which the 
CPU originally requested in the first of these four transfers (it is always able to accomplish 
this for reads which hit in the external cache). Therefore the pending load instruction which 
requested the Dcache fill can complete before the Deache fill finishes. In EV4, Deache fill 
data is not written into the cache array as it is received from the BIU. Rather, it accumulates 
one quadword at a time into a "pending fill” latch. When the load miss silo is empty and the 
requested quadword for the last outstanding load miss is received, the Ibox resumes execution 
of Abox-directed instructions despite the still-pending Dcache fill. When the entire cache line 
has been received from the BIU, it is written into the Deache data array whenever the array 
isn’t otherwise busy with a load or a store. 
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2.5.4 Write Buffer 


The Abox contains a write buffer for two purposes: 


1. To minimize the number of CPU stall cycles by providing a high bandwidth (but finite) 
resource for receiving store data. This is required since EVx can generate store data at 
the peak rate of one quadword every CPU cycle which is greater than the rate at which 
the external cache subsystem can accept the data. 


2. To attempt to aggregate store data into aligned 32-byte cache blocks for the purpose of 
maximizing the rate at which data may be written from EVx into the external cache. 


In addition to store instructions, MB, STQ/C, STL/C, FETCH and FETCH_M instructions 
are also written into the write buffer and sent off-chip. Unlike stores, however, these write 
buffer-directed instructions are never merged into a write buffer entry with other instructions. 


Each write buffer entry contains a CAM for holding physical address bits <33:5>, four 
quadwords of data, eight longword mask bits which indicate which of the associated eight 
longwords in the entry contain valid data, and miscellaneous control bits. 


To facilitate the discussion, the following two states are defined: invalid and valid. A write 
buffer entry is invalid if it does not contain one of the above-listed write buffer-directed 
commands. A write buffer entry is valid if it contains one of the above-listed write buffer- 
directed commands. | 


The write buffer contains two pointers: the head pointer and the tail pointer. The head 
pointer points to the valid write buffer entry which has been valid the longest period of time. 
The tail pointer points to the invalid write buffer entry slot which will next be validated. If 
the write buffer is completely full (empty) the head and tail pointers point to the same valid 
(invalid) entry. 


Each time the write buffer is presented with a store instruction the physical address generated 
by the instruction is compared to the address in each valid write buffer entry. If the address 
is in the same aligned 32-byte block as an address in a valid write buffer entry which also 
contains a store then the new store data is merged into that entry, and the entry’s longword 
mask bits are updated. If no matching address is found in the write buffer, then the store data 
is written into the entry designated by the tail pointer, the entry is validated, and the tail 
pointer is incremented to the next entry. Note this scheme does not maintain write-ordering. 


The EV3 and EV4 write buffers differ in the number of entries they contain, in the flow 
control mechanism used to prevent buffer overflow, and in the mechanism which controls 
when entries are written off-chip. 


2.5.4.1 EV3 Write Buffer 


The EV3 write buffer has eight entries and employs a rather simple flow control mechanism 
to prevent the buffer from overflowing. The physical address of each store instruction is 
presented to the write buffer CAM array in the second half of pipe stage [6], and the decision 
as to whether the store data can be merged with an existing entry or whether a new entry 
will be required is made in the first half of pipe stage [7]. Write buffer overflow is prevented 
by causing the Ibox to stall the execution of store instructions if necessary. Since the write 
buffer merge decision is made in pipe stage [7], and instructions are issued from pipe stage 
[3], there may be as many as three store instructions in the Abox pipeline behind a store 
instruction which causes a new buffer entry to be consumed. Therefore, in order to prevent 
overflow the Ibox stops issuing store instructions whenever there are three or fewer invalid 
write buffer entries available. 
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In EV3, the write buffer attempts to unload the head entry whenever it is valid. Store data 
may get merged into this entry up to the time the entry starts getting sent to the BIU. 


2.5.4.2 EV4 Write Buffer 


The EV4 write buffer contains four entries but employs a more complicated flow control 
mechanism which allows its entries to be better utilized than in EV3. In EV4 the Ibox issues 
store instructions irrespective of whether the write buffer is full. If a store instruction enters. 
pipe stage [6] of the Abox and the write buffer is full, the Ibox is forced to stop issuing both 
loads and stores by the same mechanism which is used for handling load misses. In effect, 
the store instruction gets treated as if it were a load miss. Any valid instructions in pipe 
stages [4] or [5] get handled exactly as if they had followed a load miss - loads which hit the 
Deache are allowed to complete, stores are presented to the Dcache, placed into the Abox silo 
and and presented to the write buffer in order with respect to other silo’ed instructions. The 
Abox silo control logic insures that no stores are lost when the write buffer is full by retrying 
silo’ed stores until they are accepted by the write buffer. 


In EV4, the write buffer attempts to send its head entry off-chip by requesting the BIU when 
one of the following conditions are met: 


1. The write buffer contains at least two valid entries. 


2. The write buffer contains one valid entry and at least 256 CPU cycles have elapsed since 
the execution of the last write buffer-directed instruction. 


3. The write buffer contains an MB instruction. 
The write buffer contains a STQ/C or STL/C instruction. 


A load miss is pending which requires the write buffer to be flushed before an external 
read is launched to service the load miss. 


When the write buffer is requesting the BIU no stores are allowed to merge into the write 
buffer’s head entry. | 


2.6 Fbox 


EV4 has an on-chip pipelined Fbox capable of executing both DEC and IEEE floating point 
instructions. IEEE floating point datatypes S and T are supported with all rounding modes 
except round to +/- infinity which is provided in PALcode. DEC floating point datatypes F and 
G are fully supported with limited support for D floating format. The Fbox contains a 32-entry 
64-bit floating point register file and a user accessible control register, FP_CTL, containing 
round mode controls, trap enables, and exception flag information. The Fbox can accept an 
instruction every cycle, with the exception of floating point divide instructions. The latency 
for data dependent, non divide instructions is six cycles. Bypassers are provided to allow 
issue of instructions which are dependent on prior results while those results are written to 
the register file. For detailed information on instruction timing, refer to Section 2.9. 


For divide instructions, the Fbox does not compute the inexact flag. Consequently, the INE 
exception flag in the FP_ CTL register is never updated for any DIV instructions. This is a 
known incompatibility in the EV4 chip. 


The EV3 chip contains no on-chip floating point hardware. Floating pom instructions can be 
emulated in PALcode for EV3. 
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2.7 Cache Organization 


EV3 and EV4 each include two on-chip caches. All memory cells are fully static CMOS 6T 
structures. 


2.7.1 Data Cache 


The EV4 data cache, Dcache, contains 8Kbytes. It is a write-through, direct mapped, read- 
allocate physical cache and has 32-byte blocks. System components may keep the Deache 
coherent with memory by using the invalidate bus described in the pin bus section of this 
specification. 


The EV3 data cache contains 1Kbytes. 


2.7.2 Instruction Cache 


The EV4 instruction cache, Icache, is an 8Kbyte physical direct-mapped cache. Icache blocks, 
or lines, contain 32-bytes of instruction stream data with associated tag as well as a six-bit 
ASN field, a one-bit ASM field and an eight-bit branch history field per block. It does not 


contain hardware for maintaining coherency with memory and is unaffected by the invalidate 
bus. 


EV4 also contains a single-entry Icache stream buffer which together with its supporting 
logic reduces the performance penalty due to Icache misses incurred during in-line instruction 
processing. The stream buffer physically consists of latches for one Icache block’s data and tag 
bits which are adjacent to the fill-side of the cache array, and a comparator, 13-bit incrementer 
and associated datapath hardware and control in the Abox. 


When an Icache miss occurs, the Ibox sends an Icache fill request to the Abox, which 
simultaneously requests the BIU and checks the stream buffer for the requested block. If 
the block is present in the stream buffer the Abox aborts the original Icache fill request, 
writes the requested block into the Icache and launches a prefetch request to the BIU for the 
next consecutive Icache block. The Ibox does not interact with the stream buffer - from the 
Ibox’s perspective Icache misses which hit the stream buffer are the same as any other Icache 
miss except that the Icache fill finishes sooner. 


When an Icache miss also misses the stream buffer the Abox launches a request for the 
required fill block and subsequently launches a prefetch request for the next consecutive fill 
block, thus getting the stream buffer started down the next I-stream path. Stream buffer 
prefetch requests never cross physical page boundaries, but instead wrap around to first 
block of the current page. 


The EV8 instruction cache contains 1Kbytes. It is a physical direct-mapped cache and has 
32-byte blocks. The EV3 chip contains no hardware for keeping the Icache coherent with 
memory. Further, it is unaffected by the invalidate bus. It does not contain ASN,ASM or 
branch history information. 


A physical, incoherent Icache has the following implications: 


1. Software which creates or modifies the instruction stream must execute an IMB PAL call 
before trying to execute the new instructions. The PAL IMB routine must explicitly flush 
the Icache by writing to the FLUSHL_IC register. | 
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2. As virtual pages migrate from one physical page frame to another, the Icache may become 
incoherent with memory. A sufficient means of keeping the Icache coherent for this case 
is for the PALcode which implements the TBIA, and TBIAP PAL calls to explicitly flush 
the Icache as described above. The ASN field and supporting PAL code in EV4 provide 
functionality to conform to the ALPHA SRM requirements regarding instruction caches 
while reducing the need to flush the Icache. 


2.8 Pipeline Organization 


EV4 has a seven stage pipeline for integer operate and memory reference instructions. 
Floating point operate instructions progress through a ten stage pipeline. The Ibox maintains 
state for all pipeline stages to track outstanding register writes, and determine Icache 
hit/miss. The pipeline diagrams below show the Ebox, Ibox, Abox and Fbox pipelines. The 
first four cycles are executed in the Ibox and the last stages are box specific. There are 
bypassers in all of the boxes that allow the results of one instruction to be used as operands 
of a following instruction without having to be written to the register file. The following 
section describes the pipeline scheduling rules. 


Integer Operate Pipeline: 


e IF - Instruction Fetch. 

¢ SWAP - Swap Dual Issue Instruction /Branch Prediction. 
e 10 - Decode. 

e¢ I] - Register file(s) access / Issue check. 

¢ Al - Computation cycle 1/ Ibox computes new PC. 

e A2 - Computation cycle 2 / ITB look-up 

¢ WRE.- Integer register file write / Icache Hit/Miss 


Memory Reference Pipeline: 


[0] ~~] [2] 3] [4] [5] [6] 
e AC - Abox calculates the effective D-stream address. 
¢ TB - DTB look-up. 
e HM - Deache Hit/Miss and load data register file write 
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Floating Point Operate Pipeline: | 


e F1-F5 - Floating point calculate pipeline 
° FWR- Floating point register file write 


The EV4 integer pipeline divides instruction processing into four static and three dynamic 
stages of execution. The EV4 floating point pipeline maintains the first four static stages 
and adds six dynamic stages of execution. The first four stages consist of the instruction 
fetch, swap, decode and issue logic. These stages are static in that instructions may remain 
valid in the same pipeline stage for multiple cycles while waiting for a resource or stalling for 
other reasons. Dynamic stages always advance state and are unaffected by any stall in the 
pipeline. Pipeline stalls are also referred to as pipeline freezes. A pipeline freeze may occur 
while zero instructions issue, or while one instruction of a pair issues and the second is held 
at the issue stage. A pipeline freeze implies that a valid instruction or instructions is (are) 
presented to be issued but can not proceed. 


Upon satisfying all issue requirements, instructions are allowed to continue through any 
pipeline toward completion. After issuing, instructions cannot be held in a given pipe stage. 
It is up to the issue stage to insure that all resource conflicts are resolved before an instruction 
is allowed to continue. The only means of stopping instructions after the issue stage is an 
abort condition. Note that the term abort as used here is different from its use in the ALPHA 
SRM. : 


Aborts may result from a number of causes. In general, they may be grouped into two 
classes, namely exceptions (including interrupts) and non exceptions. The basic difference 
between the two is that exceptions require that the pipeline be drained of all outstanding 
instructions before restarting the pipeline at a redirected address. In either case, the pipeline 
must be flushed of all instructions which were fetched subsequent to the instruction which 
caused the abort condition. This includes stopping one instruction of a dual issued pair 
in the case of an abort condition on the first instruction of the pair. The non exception 
case, however, does not need to drain the pipeline of all outstanding instructions ahead | 
of the aborting instruction. The pipeline can be immediately restarted at a redirected 
address. Examples of non exception abort conditions are branch mispredictions, subroutine 
call/return mispredictions and instruction cache misses. Data cache misses do not produce 
abort conditions but can cause pipeline freezes. 


In the event of an exception, the processor aborts all instructions issued after the excepting 
instruction as described above. Due to the nature of some error conditions, this may occur as 
late as the write cycle. Next, the address of the excepting instruction is latched in the EXC_ 
ADDR IPR. When the pipeline is fully drained, the processor begins instruction execution at 
the address given by the PALcode dispatch. The pipeline is drained when all outstanding 
writes to both the integer and floating point register file have completed and all outstanding 
instructions have passed the point in the pipeline such that all instructions are guaranteed 
to complete without an exception in the absence of a machine check. 


It should be noted that there are two basic reasons for non-issue conditions. The first is a 
pipeline freeze wherein a valid instruction or pair of instructions are prepared to issue but 
cannot due to a resource conflict. These type of non-issue cycles can be minimized through 
_ code scheduling. The second type of non-issue conditions consist of pipeline bubbles where 
there is no valid instruction in the pipeline to issue. Pipeline bubbles exist due to abort 
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conditions as described above. In addition, a single pipeline bubble is produced whenever 
a branch type instruction is predicted to be taken, including subroutine calls and returns. 
Pipeline bubbles are reduced directly by the hardware through bubble squashing, but can 
also be effectively minimized through careful coding practices. Bubble squashing involves 
the ability of the first four pipeline stages to advance whenever a bubble is detected in the 
pipeline stage immediately ahead of it while the pipeline is otherwise frozen. 


2.9 Scheduling and Issuing Rules 
2.9.1 Instruction Class Definition 


It is important to note that the following scheduling and dual issue rules are only performance 
related. There are no functional dependencies related to scheduling or dual issuing. The 
scheduling and issuing rules are defined in terms of instruction classes. The table below 
specifies all of the instruction classes and the box which executes the particular class. 


Table 2—1: Producer-Consumer Classes 


Class Name Box | Instruction List 

LD Abox all loads, (MFPR, RCC, RS, RC, STC producers 
only), (FETCH consumer only) 

ST Abox all stores, MTPR 

IBR | Ebox integer conditional branches 

FBR Fbox floating point conditional branches 

JSR . Ebox jump to subroutine instructions JMP, JSR, RET, or 
JSR_COROUTINE, (BSR, BR producer only) 

IADDLOG Ebox ADDL ADDL/V ADDQ ADDQ/V SUBL SUBL/V 


SUBQ SUBQ/V S4ADDL S4ADDQ S8ADDL 
S8ADDQ S4SUBL S4SUBQ S8SUBL S8SUBQ 
LDA LDAH AND BIS XOR BIC ORNOT EQV 


SHIFTCM Kbox SLL SRL SRA EXTQL EXTLL EXTWL EXTBL 
EXTQH EXTLH EXTWH MSKQL MSKLL MSKWL 
MSKBL MSKQH MSKLH MSKWH INS@QL INSLL 
INSWL INSBL INSQH INSLH INSWH ZAP . 
ZAPNOT CMOVEQ CMOVNE CMOVLT CMOVLE 
CMOVGT CMOVGE CMOVLBS CMOVLBC 


ICMP EKbox | CMPEQ CMPLT CMPLE CMPULT CMPULE 
CMPBGE 

IMULL Ebox MULL MULL/V 

IMULQ Ebox MULQ MULQ/V UMULH 

FPOP | Fbox floating point operates except divide 

FDIV Fbox floating point divide 
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2.9.2 Producer-Consumer Latency Matrix 


EV3 and EV4 enforce the same issue rules regarding producer/consumer latencies in all cases 
except FPOP-FST in which EV4 is two cycles faster. In fact, floating point code will produce 
almost identical timing, although no floating point data, between EV3 and EV4 when run 
with the FPE bit of the ICCSR set. FDIV instructions, however, should never be issued 
on EV3 because they will not be signaled as complete and therefore prevent any dependent 
instruction from issuing. 


The scheduling rules are described as a producer-consumer matrix. Each row and column in 
- the matrix is a class of ALPHA instructions. A’1’ in the Producer-Consumer Latency Matrix 

indicates one cycle of latency. A one cycle latency means that if instruction B uses the results 

of instruction A, then instruction B may be issued ONE cycle after instruction A is issued. 


The first thing to do when determining latency for a given instruction sequence is to identify 
the classes of all the instructions. The example below has the classes listed in the comment 


field. _ 
ADDQ R1, R2, R3 ! IADDLOG class 
SRA R3, R4, RS5 ! SHIFT class 
SUBQ R5, R6, R7 ! ITADDLOG class 
STQ R7, D(R10) ! ST class 


The SRA instruction consumes the result (R3) produced by the ADDQ instruction. The latency 
associated with an iadd-shift producer-consumer pair as specified by the matrix is one. That 
means that if the ADDQ was issued in cycle ’n’ the SRA could be issued in cycle ’n+1’. The 
SUBQ instruction consumes the result (R5) produced by the SRA instruction. The latency 
associated with a shift-iadd producer-consumer pair as specified by the matrix is two. That 
means that if the SRA was issued in cycle ’n’ the SUBQ could be issued in cycle ’n+2’. The 
Ibox injects one nop cycle in the pipeline for this case. 


The final case has the STQ instruction consuming the result (R7) produced by the SUBQ 
instruction. The latency associated with an iadd-st producer-consumer pair where the result 
of the iadd is the store data is zero. This means that the SUBQ and STQ instruction pair 
can be dual-issued. 
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Producer Class 


(dass): Ue, Aso ae aS aS fe uh, SB SRE ul Be 
| DI S| Al HI CC] M | M | Pl D {| Df 
| YR Dep De te Uf) ts a ae A 
| (1) | De Se ER kee a Si Bs NE eds A. 
| | b <a Ss | LD | Q | | F/S | G/T | 
Consumer | | | Of C¢ | | | | | | | 
Class | | | Gi M | | (3) | (3) | | (4) | (4) | 
--------—-~ + ---—4-—- = 4p 4 
| | | | | | | | | | | 
LD {| 3] 3] 2 {[ 2] 2 {121 | 23 | xX |] xX { xX | 
ST (2) {| 3 | 3 | 2/0) 2/0} 2/0) 21/20|23/22| X/4| X/32| X/61]| 
IBR Sane eee (OG a) (Sc (2 Co (es a <a (Ge | 
JSR et Bl) Yea ee) 2 EO eR eT 
| | | | | | | | | | | 
IADDLOG | Sf Se tet 2) 2 ee ee Se CP Oe 
SHIFTCM b. Soe aah efi ea ee ee Mars Ea ep a 
ICMP be cade Be ad. ee 2 ee eae ls oe, 
IMUL [' 2h 3] Be Bay. -Boop2ZL/ESp2372T, ep X |e 
| | | | | | | | | | | 
FBR | 3) X¥] Xt XI] XI] X | KX | 6 | 34 | 63 | 
FPOP | 3 | xX] xXx] xX] xX | KX | 6 | 34 | 63 | 
FDIV | 3] XI} Xd xX XI] xX | X | 6 134/30] 63/59} 
Notes: 


1. For loads, Deache hit is assumed. The latency for a Dcache miss and an external cache 
hit is dependent on the system configuration. The latency is determined as the register 
file write time less 1 cycle. 


2. For some producer classes, two latencies, X/Y, are given with the ST consumer class. X 
represents the latency for base address of store and Y represents the latency for store 
data. FDIV results cannot be used as the base address for store operations. 


3. For IMUL followed by IMUL, there are two latencies given. The first represents the 
latency with data dependency, i.e. the second IMUL uses the result from the first. The 
second is the multiply latency without data dependencies. 


4. For FDIV followed by FDIV, there are two latencies given. The first represents the latency 
with data dependency, i.e. the second FDIV uses the result from the first. The second is 
the division latency without data dependencies. 


2.9.3 Producer-Producer Latency 


Producer-producer latency, also known as write after write conflicts, are restricted only by the 
register write order. For most instructions, this is dictated by issue order, however IMUL, 
FDIV and LD instructions may require more time than other instructions to complete and 
therefore must stall following instructions that write the same destination register to preserve 
write ordering. In general, only cases involving an intervening producer-consumer conflict are 
of interest. They can occur commonly in a dual issue situation when a register is reused. In 
these cases, producer-consumer latencies are equal to or greater than the required producer- 
producer latency as determined by write ordering and therefore dictate the overall latency. 


EVx Micro-architecture 2-13 


An example of this case is shown in the code: 


LDQ R2,D(RO) ; R2 destination 
ADDQ R2,R3,R4 ; wr-rd conflict stalls execution waiting for R2 
LDQ R2,D(R1) ; wr-wr conflict may dual issue when addq issues 


2.9.4 EVx issue Rules 


The following is a list of conditions that prevent both EV3 and EV4 from issuing an 
instruction. 


1. 


6. 


No instruction can be issued until all of it’s source and destination registers are clean, 
i.e. all outstanding writes to the destination register are guaranteed to complete in issue 
order and there are no outstanding writes to the source registers or those writes can be 
bypassed. 


No LD, ST, FETCH, MB, RCC, RS, RC, DRAINT, HW_MXPR or BSR,BR,JSR(with 
destination other than R3 1) can be issued after a MB instruction until the MB has been 
acknowledged on the external pin bus. 


No IMUL instructions can be issued if the integer multiplier is busy. 


No SHIFT, IADDLOG, ICMP or ICMOV instruction can be issued exactly three cycles 
before an integer multiplication completes. 


No integer or floating point conditional branch instruction can be issued in the cycle 
immediately following a JSR,JMP,RET,JSR_COROUTINE or HW_REI instruction. 


No DRAINT instruction can be issued as the second instruction of a dual issue pair. 


2.9.4.1 EV3 Specific Issue Rules 
The following rules are specific to EV3. 


1. 


2. 


No LD instructions can be issued in the two cycles immediately following any store 
instruction. 


No LD, ST, FETCH, MB, RCC, RS, RC, DRAINT, HW_MXPR or BSR,BR,JSR(with 
destination other than R31) instruction can be issued after a load miss until all pending 
D-stream fills have been completed. 


No ST, MB, FETCH or FETCH_M instruction can be issued when the write buffer is full. 


EV8 does not contain an on-chip floating point unit, therefore if the FPE bit of the ICCSR 
is set, any instruction that attempts to use the results of an FDIV instruction will not 
issue. Ever. Only reset will clear this condition. 


2.9.4.2 EV4 Specific issue Rules 
The following rules are specific to EV4. 


I: 
2. 


No LD instructions can be issued in the two cycles immediately following a STC. 


No LD, ST FETCH, MB, RCC, RS, RC, DRAINT, HW_MXPR or BSR,BR,JSR(with 
destination other than R31) instruction can be issued when the Abox is busy due to a 
load miss or write buffer overfiow. For more information see section 2.5.3. 


No FDIV instruction can be issued if the floating pointer divider is busy. 
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4. No floating point operate instruction can be issued exactly five or exactly six cycles before 
the floating point divide completes. 


2.9.5 Dual Issue Rules 


The table below lists the classes of instruction pairs that can be issued in a single cycle. An 
instruction from a class in the first column below may be issued in the same cycle as an 
instruction from a class in the second column, in the absence of data dependencies and if the 
two instructions occupy the same aligned quadword in memory. 


Table 2~2: Dual Issue Rules 


Instruction 1 


Instruction 2 


LD integer LD floating pt 
LD floating pt LD integer 
ST floating pt — ST integer 
FBR IBR 
IADDLOG FPOP 
SHIFT FDIV 
ICMP JSR 
ICMOV BSR 
IMUL BR 
HW_x 
CALL_PAL 
Exceptions: 


e No more than one of LD, ST, HW_MXPR, FETCH, RCC, RS, RC, MB, DRAIN, HW_REI, 
BSR, BR or JSR can be issued in the same cycle. | 


¢ No more than one of JSR, IBR, BSR, HW_REI, BR or FBR can be issued in the same 


cycle. 
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‘Chapter 3 
Privileged Architecture Library Code 


3.1 Introduction 


In a family of machines both users and operating system implementers require functions 
to be implemented consistently. When functions are implemented to a common interface, 
the code that uses those functions can be used on several different implementations without 
modification. 


These functions range from the binary encoding of the instructions and data, to the exception 
mechanisms and synchronization primitives. Some of these functions can be cost effectively 
implemented in hardware, but several are impractical to implement directly in hardware. 
These functions include low-level hardware support functions such as translation buffer 
fill routines, interrupt acknowledge, and exception dispatch. Also included is support for 
privileged and atomic operations that require long instruction sequences such as Return 
from Exception or Interrupt (REI). 


In the VAX architecture, these functions are generally provided by microcode. In EVx, 
there is no microcode. However an architected interface to these functions that will be 
consistent with other members of ALPHA family of machines is still required. The Privileged 
Architecture Library Code (PALcode) is used to implement these functions without resorting 
to a microcoded machine. The EVx hardware development group will provide and maintain 
a version of the PALcode for EVx. Module development groups will have to provide and 
maintain module specific modifications to the PALcode. 


3.2 PAL Environment 


PALcode runs in an environment with privileges enabled, instruction stream mapping 
disabled, and interrupts disabled. The enabling of privileges allows all functions of the 
machine to be controlled. Disabling of instruction stream mapping allows PALcode to be 
used to support the memory management functions (e.g., translation buffer miss routines 
can not be run via mapped memory). PALcode can perform both virtual and physical data 
stream references. The disabling of interrupts allows the system to provide multi-instruction 
sequences as atomic operations. The PALcode environment in EVx also includes 32 PAL 
temp registers which are accessible only by PAL reserved move to/from processor register 
instructions. 
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3.3 Special PAL Instructions 


PALcode uses the ALPHA instruction set for most of its operations. EVx maps the architec- 
turally reserved PALcode opcodes (PALRESO - PALRES4) to a special load and store (HW_LD, 
HW_ST), a move to and move from processor register (HW_MTPR, HW_MFPR), and a return 
from PALmode exception (HW_REI). These instructions produce a Reserved Opcode fault if 
executed while not in the PALcode environment unless the HWE bit of the ICCSR IPR is set, 
in which case these instructions can be executed in kernel mode. 


Register checking and bypassing logic is provided for PALcode instructions as it is for non- 
PALcode instructions when using general purpose registers. Explicit software timing is 
required for accessing the hardware specific [PRs and the PAL_TEMPs. These constraints 
are described in the PALmode restriction and IPR sections. 
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3.3.1 HW_MFPR and HW_MTPR 


The internal processor register specified by the PAL, ABX, IBX, and index field is written/read 
with the data from the specified integer register. Processor registers may have side effects 
that happen as the result of writing/reading them. Coding restrictions are associated with 
accessing various registers. Separate bits are used to access Abox IPRs, Ibox IPRs, and 
PAL_TEMPs, therefore it is possible for an MTPR instructions to write multiple registers in 


parallel if they both have the same index. 


The HW_MFPR and HW_MTPR instructions have the following format: 


S 2 2 2 2 & 00000 0 
1: 6 5 1 0 6 5 8765 4 0 
+---------- $-------- +—------— fone $-t-+-+-—--------- + 
| | | | |P|A|T| | 
| OPCODE | RA | RB | IGN JAIBIBI INDEX | 
| | | | [L{X|X] | 

+---------- $—-----—= $a-— = + 


----------- $ot—+-+-----------+ 


Table 3-1: HW_MFPR and HW_MTPR Format Description 
Field Description 


OPCODE Is either 25 (HW_MFPR) or 29 (HW_MTPR). 

RA/RB Contain the source, HW_MTPR or destination, HW_MFPR, aceite number. The RA and 
RB fields must always be identical. 

PAL If set this HW_MFPR-or HW_MTPR instruction is referencing a PAL temporary register, 
PAL_ TEMP. 

ABX | If set this HW_MFPR or HW_MTPR instruction is referencing a register in the Abox. 

IBX If set this HW_MFPR or HW_MTPR instruction is referencing a register in the Ibox. 

INDEX Specifies hardware specific register as shown in Table 3-2 


The following table indicates how the PAL, ABX, IBX, and INDEX fields are set to access the 
internal processor registers. Setting the PAL, ABX, and IBX fields to zero generates a NOP. 


Table 3-2: IPR Access | 
Mnemonic PAL ABX IBX #$$INDEX Access 


TB_TAG x x 1 0 W 
ITB_PTE x x 1 1 R/W 
ICCSR x x 1 2 R/W 
ITB_PTE_TEMP x x 1 3 R 
EXC_ADDR x x 1 4 R/W | 


Comments 
PAL mode only 
PAL mode only 


PAL mode only 
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Table 3-2 (Cont.): IPR Access 


Mnemonic PAL ABX IBX INDEX Access Comments 
 SL_RCV x x 1 5 R 

ITBZAP x x 1 6 WwW PAL mode only 

ITBASM — x x 1 7 W PAL mode only 

ITBIS x x 1 8 W PAL mode only 

PS x x 1 9 R/W 

EXC_SUM x x 1 10 R/W 

PAL_BASE x x 1 11 R/W 

HIRR x x 1 12 R 

SIRR x x 1 13 R/W 

ASTRR x x 1 14 R/W 

HIER x x 1 16 R/W 

SIER x x 1 17 R/W 

ASTER x x 1 18 R/W 

SL_CLR x x 1 19 W 

SL_XMIT x x 1 29 W 

DTB_CTL x 1 x 0 WwW 

DTB_PTE x 1 x 2 R/W 

DTB_PTE_TEMP x 1 X 3 R 

MMCSR x 1 x 4 R 

VA x 1 x 5) R 

DTBZAP x 1 x 6 W 

DTASM x 1 x 7 WwW 

DTBIS x 1 x 8 WwW 

BIU_ADDR x 1 x 9 R 

BIU_STAT x 1 x 10 R 

DC_ADDR x 1 x 11 R 

DC_STAT x 1 x 12 R 

FILL_ADDR : x 1 x 13 R 

1 x 14 WwW 


ABOX_CTL Xx. 
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Table 3-2 (Cont.): iPR Access 


Mnemonic PAL ABX IBX INDEX Access Comments 
ALT MODE x 1 x 15 W 

CC x 1 x 16 W 

CC_CTL x 1 x 17 W 

BIU_CTL x 1 x 18 W 

FILL_SYNDROME x 1 x 19 R 

BC_TAG x 1 x 20 R | 
FLUSH_IC x 1 x 21 W 

FLUSH_IC_ASM x 1 x 23 W EV4 Only 
PAL_ TEMP{31..0] 1 x x 31-00 — R/W 
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3.3.2 HW_LD and HW_ST 


PALcode uses the HW_LD and HW_ST instructions to access memory outside of the realm 


of normal ALPHA memory management. The HW_LD and HW_ST instructions have the 
following format: 


3 2.2 Z° 2 a? de a ok 0 
a 6 5 1 0 654321 Q 
fame $o------- a papa papa pen nn = + 
| | | IPJA[R{Q| | 
| RA | RB [H|LIW[ WI DISP | 
| | | IY{TIC] | | 
$—~--~----- $—------= $--— = ce ee oe + 


The effective address of these instructions is calculated as follows: 


addr <- (SEXT(DISP) + RB) AND NOT (QW | 11 (bin) ) 


Table 3-3: HW_LD and HW_ST Format Description 


Field 
OPCODE 
RA/RB 
PHY 


RWC 
QW 


DISP 


Description 
Is either 27 (HW_LD) or 31 (HW_ST). 
Contain register numbers, interpreted in the normal fashion for loads and stores. 


If clear the effective address of the HW_LD or HW_ST is a virtual address. If set then 
the effective address of the HW_LD or HW_ST is a physical address. 


For virtual-mode HW_LD and HW_ST instructions this bit selects the processor mode 


bits which are used for memory management checks. If ALT is clear the current mode 


bits of the PS register are used, while if ALT is set the mode bits in the ALT_MODE IPR 
are used. 


In EV4, physical-mode load-lock and store-conditional variants of the HW_LD and HW_ 
ST instructions may be created by setting both the PHY and ALT bits. 


The RWC (read with write check) bit, if set, enables both read and write access checks 
on virtual HW_LD instructions. 


The quadword bit specifies the data length. If it is set then the length is quadword. If it 
is clear then the length is longword. 


The DISP field holds a 12-bit signed byte displacement. 
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3.3.3 HW_REI 


The HW_REI instruction uses the address in the Ibox EXC_ADDR IPR to determine the new 
virtual program counter (VPC). Bit zero of the EXC_ADDR indicates the state of the PALmode 
bit on the completion of the HW_REI. If EXC_ADDR bit(0] is set then the processor remains 
in PALmode. This allows PALcode toe transition from PALmode to non-PALmode. The HW_ 
REI instruction can also be used to jump from PALmode to PALmode. This allows PAL 
instruction flows to take advantage of the D-stream mapping hardware in EVx, including 
traps. The HW_REI instruction has the following format: 


Note that bits[15..14] contain the branch prediction hint bits. EVx pushes the contents of 
the EXC_ADDR register on the JSR prediction stack. Bit{15] must be set to pop the stack to 
avoid misalignment. The next address and PALmode bit are calculated as follows: 


VPC <- EXC_ADDR AND {NOT 3} 
PALmode <- EXC ADDR[0] 


Table 3-4: The HW_REI Format Description 


Field Description 
OPCODE The OPCODE field contains 30. 
RA/RB Contain register numbers which should be R31 or a stall may occur. 


3.4 PAL Entry Points 


When an exception or interrupt occurs on EVx the chip first drains the pipeline, loads the 
PC into the EXC_ADDR IPR and then dispatches to one of the exception routines. The 
pipeline is drained when all instructions that update either register file have completed, and 
all instructions that do not update the register files are guaranteed to complete without an 
exception in the absence of a machine check. In addition, EV4 requires that all pending 
Deache fill operations have completed before dispatch to one of the exception routines. If 
multiple exceptions occur, EVx dispatches to the highest priority PAL entry point. The table 
below prioritizes entry points from highest to lowest priority, i.e. the first row in the table 
(reset) has the highest priority. 


The table below defines only the entry point offset, bits [13..0]. The high-order bits of the 
new PC (bits (33..14]) come from the PAL_BASE IPR. 


Note that PALcode at PAL entry points of higher priority than DTBMISS must unlock possible 
MMCSR IPR and VA IPR locks. 
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Table 3—5: PAL Entry Points 
Entry Name Time 


RESET anytime 
MCHK pipe_stage[7] 
ARITH | anytime 


INTERRUPT anytime 


D-stream errors _ pipe_stage[6] 


ITB_MISS pipe_stage[5] 
ITB_ACV pipe_stage[5] 
CALLPAL | pipe_stage[5] 
OPCDEC pipe_stage[5] 
FEN pipe_stage[5] 


Offset(Hex) 
0000 
0020 
0060 
00EO 


01E0, 08E0, 
09E0, 110 


03E0 

07E0 
2000,40,60 thru 
3FE0 

13E0 

17E0 


Cause 


Uncorrected hardware error. 
Arithmetic exception. 
Includes corrected hardware error. 


See Table 3-6. 


ITB miss. 
I-stream access violation. 


256 locations based on instruction[7..0]. If bit[{7] 
equals zero and CM does not equal kernel mode 
then an OPDEC exception occurs. 


Reserved or privileged opcode. 

FP op attempted with : 

FP instructions disabled via ICCSR FPE bit 

FP IKEE round to +/- infinity 

FP IEEE with datatype field other than S,T,QW 


The PAL entry points assigned to D-stream errors require a bit more explanation. The. 
hardware recognizes four classes of D-stream memory management errors: bad virtual 
address (improper sign extension), DTB miss, alignment error and everything else (ACV, 
FOR, FOW). These errors get mapped into four PAL entry points: UNALIGN, DTB_MISS 
PAL mode, DTB_MISS Native mode and D_FAULT. Table 3-5 lists the priority of these entry 
points as a group with respect to each of the other entry points. Since a particular D-stream 
memory reference may generate errors which fall into more than one of the four error classes 
which the hardware recognizes, we also must define the priority of each of the D-stream PAL 
entry points with respect to the others in the D-stream PAL entry group. Table 3-6 gives 
this priority. The PAL entry point 8E0 for Native mode DTB_MISS is only available in EV4. 
EV3 provides only one DTB_MISS PAL entry point at address offset 9EO. 
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Table 3-6: D-stream Error PAL Entry Points 


BAD_VA MISS UNALIGN PAL Other Offset (Hex) 

1 x 0 x x 01E0 D_FAULT 

1 x 1 x x 11E0 UNALIGN 

0 1 x 0 x -  Q8E0 DTB_MISS Native 
0 1 x 1 x 09E0 DTB_MISS PAL 

0 0 1 x x 11E0 UNALIGN 

0 0 0 x 1 01E0 D_FAULT 


3.5 General PALmode Restrictions 


Many of the restrictions involve waiting ’n’ cycles before using the saaulee of PAL instructions. 

Inserting ’n’ instructions between the two time-sensitive instructions is the typical method of 
waiting for ’n’ cycles. Because EVx can dual issue instructions it is possible to write code that 
requires 2*n+1 instructions to wait ’n’ cycles. Due to the resource requirements of individual 
instructions, and the EVx hardware design, multiple copies of the same instruction can not 
be dual issued. This fact is used in some of the code examples below. 


3.5.1 EVx PAL Restrictions 


1. Asa general rule, HW_MTPR instructions require at least 4 cycles to update the selected 
IPR. Therefore, at least three cycles of delay must be inserted before using the result of 
the register update. 


Note that only the write followed by read operation requires this software timing. Multiple 
reads, multiple writes, or read followed by write will pipeline properly and do not require 
software timing except for accesses of the TB registers. 


These cycles can be guaranteed by either including 7 instructions which do not use the 
IPR in transition or proving through the dual issue rules and/or state of the machine, that 
at least 3 cycles of delay will occur. As a special case, multiple copies of a HW_MTPR 
instruction, used as a NOP instruction, can be used to pad cycles after the original HW_ 
MTPR. Since multiple copies of the same instruction will never dual issue, the maximum 
number of instructions necessary to insure at least 3 cycles of delay is 3. 


An example of this is : 


HW MTPR Rx, HIER > Write to HIER 
HW MFPR R31, 0 NOP mxpr instruction 
HW MFPR R31, 0 NOP mxpr instruction 
HW _MFPR R31, 0 NOP mxpr instruction 
HW MFPR Ry, HIER Read from HIER 


we Se Ne Ne OY 
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The HW_REI instruction uses the ITB if the EXC_ADDR register contains a non PAL 
mode VPC, VPC<0> = 0. By the rule above, this implies that at least 3 cycles of delay 
must be included after writing the ITB before executing a HW_REI instruction to exit 
PAL mode. 


Exceptions: 


¢ The PAL_TEMP register file is treated as a single register under this rule. However, 
_ PAL_TEMP registers may be read after 3 cycles of delay, not 4. This translates to 
code of the form: 


HW_MTPR Rx, PAL RO ; Write PAL temp 0 

HW MFPR R31, 0 ; NOP mxpr instruction 
HW MFPR R31, 0 ; NOP mxpr instruction 
HW _MFPR Ry, PAL R1 ; Read PAL temp i 


¢ The EXC_ADDR register may be read by a HW_REI instruction only 2 cycles after 
the HW_MTPR. This is equivalent to one intervening cycle of delay. This translates 
to code of the form: 


HW_MTPR Rx, EXC_ADDR ; Write EXC_ADDR 
HW MFPR R31, 0 ; NOP cannot dual issue with either 
HW REI ; Return 


2. An MTPR operation to the DTBIS register cannot be bypassed into. In other words, all 
data being moved to the DTBIS register must be sourced directly from the register file. 
One way to insure this is to provide at least 3 cycles of delay before using the result of 
any integer operation (except MUL) as the source of an MTPR DTBIS. Do not use a MUL 
as the source of DTBIS data. Sample code for this operation is : 


ADDQ R1,R2,R3 

ADDQ R31,R31,R31 
ADDO R31,R31,R31 
ADDQ R31,R31,R31 
ADDQ R31,R31,R31 
HW MTPR R3,DTBIS 


> source for DTBIS address 

> cannot dual issue with above, lst cycle of delay 

; 2nd cycle of delay 

; Brd cycle of delay 

may dual issue with below, else 4th cycle of delay 
R3 must be in register file, no bypass possible 


we * 8 ~ ~ ™ 


3. When loading the CC register, bits <3:0> must be loaded with zero. Loading non-zero 
values in these bits may cause the count to be inaccurate. 


4. An MTPR DTBIS cannot be combined with an MTPR ITBIS instruction. The hardware 
will not clear the ITB if both the Ibox and Abox IPRs are simultaneously selected. Instead, 
two instructions are needed to clear each TB individually. Code example: 


HW _MTPR Rx, ITBIS 
HW MTPR Ry,DTBIS 


5. An MXPR ITB_TAG, ITB_PTE, ITB_PTE_TEMP cannot follow a HW_REI that remains 
in PAL mode. (Address bit<0> of the EXC_ADDR is set) This rule implies that it is not a 
good idea to ever allow exceptions while updating the ITB. If an exception interrupts flow 
of the ITB miss routine and attempts to REI back, and the return address begins with 
a HW_MxPR instruction to an ITB register, and the REI is predicted correctly to avoid 
any delay between the two instructions, then the ITB register will not be written. Code 


example: 
HW REI ; return from interrupt 
HW_MTPR R1,ITB_ TAG ; attempts to execute very next cycle, instr ignored 
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6. 


10. 


11. 


12. 


The ITB_TAG,ITB_PTE and ITB_PTE_TEMP registers can only be accessed in PAL mode. 
If the instructions HW_MTPR or HW_MFPR to/from the above registers are attempted 
while not in PAL mode by setting the HWE (hardware enable) bit of the ICCSR, the 
instructions will be ignored. 


Machine check exceptions taken while in PAL mode may load the EXC_ADDR register 
with a restart address one instruction earlier than the proper restart address. Some 
HW_MxPR instructions may have already completed execution even though the restart 
address indicates the HW_MxPR as the return instruction. Re-execution of some HW_ 
MxPR instructions can alter machine state. (e.g. TB pointers, EXC_ADDR register mask) 


The mechanism used to stop instruction flow during machine check exceptions causes 
the machine check exception to appear as a D-stream fault on the following instruction 
in the hardware pipeline. In the event that the following instruction is a HW_MxPR, a 
D-stream fault will not abort execution in all cases. Although the EXC_ADDR will be 
loaded with the address of the HW_MxPR instruction as if it were aborted, a HW_REI to 
this restart address will incorrectly re-execute this instruction. 


Machine check service routines should check for MXPR instructions at the return address 
before continuing. 


When writing the PAL_BASKE register, exceptions may not occur. An exception occurring 
simultaneously with a write to the PAL BASE may leave the register in a metastable state. 
All asynchronous exceptions but reset can be avoided under the following conditions: 


PAL MODS: asso Saw a ore worse dtccs blocks all interrupts 
machine checks disabled ..... blocks I/O error exceptions 

(via ABOX_CTL reg or MB isolation) 
Not under trap shadow ....... avoids arithmetic traps 


The trap shadow is defined as : 
less than 3 cycles after a non-mul integer operate that may overflow 
less than 22 cycles after a MULL/V instruction 
less than 24 cycles after a MULO/V instruction 
less than 6 cycles after a non-div fp operation that may cause a trap 
less than 34 cycles after a DIVF or DIVS that may cause a trap 
less than 63 cycles after a DIVG or DIVT that may cause a trap 


The sequence MTPR PTE, MTPR TAG is NOT allowed. At least one cycle must be allowed 
after an MTPR PTE before the corresponding MTPR TAG instruction. 


The AMCHK exception service routine must check the EXC_SUM register for simulta- 
neous arithmetic errors. Arithmetic traps will not trigger exceptions a second time after 
returning from exception service for the machine check. 


Three cycles of delay must be inserted between HW_MFPR DTB_PTE and HW_MFPR 
DTB_PTE_TEMP. Code example: 


HW _MFPR Rx,DTB PTE ; reads DTB PTE into DTB PTE TEMP register 
HW MFPR R31,0 ; lst cycle of delay 
HW MFPR R31,0 ; 2nd cycle of delay 


HW_MFPR Ry,DTB PTE TEMP ; read DTB PTE TEMP into register file Ry 


Three cycles of delay must be inserted between HW_MFPR IPTE and HW_MFPR ITB_ 
PTE_TEMP. Code example: 
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HW_MFPR Rx,DTB PTE ; reads DTB PTE into DTB PTE _TEMP register 
HW _MFPR R31,0 ; lst cycle of delay 
HW | _MFPR R31,0 ; 2nd cycle of delay 
HW ' MFPR Ry, DTB_ PTE TEMP ; read DTB PTE TEMP into register file Ry 


13. The content of the destination register for HW_MFPR Rx,DTB _PTE or HW_MFPR 
Rx, ITB_PTE is UNPREDICTABLE. 


14. Two HW_MFPR DTB_PTE instructions cannot be issued in consecutive cycles. This 
implies that more than one instruction may be necessary between the HW_MFPR 
instructions if dual issue is possible. Similar restrictions apply to the ITB_PTE register. 


15. Reading the EXC_SUM and BC_TAG registers require special timing. Refer to Sec- 
tion 3.8.12 and Section 3.10.7 for specific information. 


16. DMM errors occurring one cycle before HW_MxPR instructions to the IPTE will NOT stop 
the TB pointer from incrementing to the next TB entry even though the mxpr instruction 
will be aborted by the DMM error. This restriction only affects performance and not 
functionality. 


3.5.2 EV3 Specific PALmode Restrictions 


1. HW_MTPR instructions writing the IPRs listed in the first column of Table 3-7 must 
guarantee that HW_MFPR instructions reading the corresponding IPRs in the second 
column cannot be decoded, even if invalid, exactly three cycles following the first HW_ 
MTPR. 


Table 3~7: EV3 IPR Conflicts 


MTPR-Write MFPR-read 
ITB_PTE | ITB_PTE_TEMP 
ICCSR ICCSR 
EXCSUM EXCSUM 
PS PS 

xIER 7 HIER 

xIER SIER 

xIER ASTER 
xIRR SLCLR 
xIRR SIRR 

xIRR ASTRR 


In other words, it must be insured that at least 3 cycles of deterministic I-stream will 
always follow the first HW_MTPR. A check of this restriction requires knowledge of 
placement within a cache block. Random cache miss data following the HW_MTPR by 3 
cycles could cause metastable conditions on the read bus. 
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EV3 PAL code avoids this problem by substituting a macro for the HW_MTPR instruction. 
The macro adds NOPs before the HW_MTPR, if necessary, to push the HW_MTPR into the 
top of a cache block and pads NOPs after the HW_MTPR to insure 3 cycles of deterministic 
I-stream. | 


In addition to the above restrictions, an HW_MFPR ITB_PTE which ’reads’ the ITB_PTE 
cannot be followed three cycles later with the decode, even if invalid, of a HW_MFPR 
ITB_PTE_TEMP which attempts to ’read’ the ITB_PTE_TEMP. 


2. The contents of the EXC_ADDR register must be written before execution of a HW_REI. 
If the EXC_ADDR is not explicitly written after an exception is taken, the register is 
not guaranteed to be properly sign extended. This can cause the HW_REI to result in an 
ACV fault. Note that the register will appear to be sign extended after a read (HW_MFPR 
EXC_ADDR) but is not. A subsequent HW_MTPR is still required. 


Code example: 


exception entry 


HW _MFPR R1,EXC_ ADDR ; read exc addr will appear to be sign extended 
HW_MTPR R1,EXC_ ADDR ; write exc_addr to insure sign extend in hardware 
HW _MTPR R31,0 ; NOP delay for one cycle before REI 

HW REI ; return without worry of surprise ACV 


3.5.3 EV4 Specific PALmode Restrictions 


1. HW_STC instructions cannot be followed, for two cycles, by any load instruction that may 
miss in the Deache. 


2. Updates to the ASN field of the ICCSR IPR require at least 10 cycles of delay before 
entering native mode that may reference the ASN during Icache access. If the ASN field 
is updated in Kernel mode via the HWE bit of the ICCSR IPR, it is sufficient that all 
I-stream references during this time be made to pages with the ASM bit set to avoid use 
of the ASN. 


3.6 Power Up 


The table below lists the state of all the IPRs immediately following reset. The table also 
specifies which IPRs need to be initialized by power-up PALcode. 


Table 3-8: IPR Reset State 


IPR Reset State Comments 
ITB TAG undefined 
ITB _ PTE undefined 
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Table 3-8 (Cont.): 


IPR 


ICCSR 


ITB_PTE_TEMP 
EXC_ADDR 
SL_RCV 
ITBZAP 
ITBASM 

ITBIS 

PS 

EXC_SUM 


PAL_BASE 
HIRR 

SIRR 
ASTRR 
HIER 

SIER 
ASTER 
SL_XMIT 
DTB_CTL 
DTB_PTE 
DTB_PTE_TEMP 
MMCSR 
VA 
DTBZAP 
DTBASM 
DTBIS 
BIU_ADDR 


IPR Reset State 
Reset State 


cleared 


undefined 
undefined 
undefined 
n/a 
n/a 
n/a 
undefined 


undefined 


cleared 
n/a 
undefined 
undefined 
undefined 
undefined 
undefined 
undefined 
undefined 
undefined 
undefined 
undefined 
undefined 
n/a 
n/a 
n/a 


undefined 


Comments 


Floating point disabled, single issue mode, VAX mode 


enabled, ASN = 0, jsr predictions disabled, branch 
predictions disabled, branch history table disabled, 
performance counters reset to zero, Perf Cnt0(16b) : Total 
Issues/2, Perf Cnt1(12b) : Deache Misses 


PALcode must do a itbzap on reset. 


PALcode must set processor status. 


Palcode must clear exception summary and exception 
register write mask by doing 64 reads. 


Cleared on reset. 


PALcode must initialize. 
PALcode must initialize. 
PALcode must initialize. 
PALcode must initialize. 
PALcode must initialize. 
PALcode must initialize. Appears on external pin. 


Palcode must select between SP/LP dtb prior to any TB fill. 


Unlocked on reset. 
Unlocked on reset. 


PALcode must do a dtbzap on reset. 


Potentially locked. 
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Table 3-8 (Cont.): 


IPR 


BIU_STAT 
SL_CLR 
DC_ADDR 
DC_STAT 
FILL_ADDR 
ABOX_CTL 


ALT_MODE 
CC 

CC_CTL 
BIU_CTL 


FILL_SYNDROME 
BC_TAG 
PAL_TEMP{81..0] 


IPR Reset State 
Reset State 


undefined 
undefined 
undefined 
undefined 
undefined 


see comments 


undefined 
undefined 
undefined 


see comments 


undefined 
undefined 


undefined 


Comments 


Potentially locked. 
PALcode must initialize. 
Potentially locked. 
Potentially locked. 
Potentially locked. 


[11..0] <- “x0100 Write buffer enabled, machine checks 
disabled, correctable read interrupts disabled, Icache stream 
buffer disabled, Dcache disabled, forced hit mode off. 


Cycle counter is disabled on reset. 


Beache disabled, parity mode undefined, chip enable asserts 
during RAM write cycles, Beache forced-hit mode disabled. 
BC_PA DIS field cleared. BAD_TCP cleared. BAD DP 
undefined. 


Note: The Bcache parameters BC RAM read speed, BC 
RAM write speed, BC write enable control, and BC size are 
all undetermined on reset and must be initialized before 
enabling the Bceache. 


Potentially locked. 
Potentially locked. 


PALcode should execute four jsr call instructions to initialize the jsr stack. This is necessary 
to insure deterministic behavior for testers. The following code will initialize the stack once 
the ICCSR [JSE] bit is set. 


stk_1: 

BSR r2,stk_2 
stk 2: 

BSR r3,stk-_ 3 
stk_3: 

BSR r4,stk 4 
stk_4: 


BSR rl,stk_1 


; push RET PC 
; push RET PC 
; push RET PC 


; push RET PC 
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3.7 TB Miss Flows 


This section describes hardware specific details to aid the PALcode programmer in writing 
ITB and DTB fill routines. These flows were included to highlight trade-offs and restrictions 
between PAL and hardware. The PALcode source that is released with EVx should be 
consulted before any new flows are written. A working knowledge of the ALPHA memory 
management architecture is assumed. 


3.7.1 ITB Miss 


When the Ibox encounters an ITB miss it latches the VPC of the target instruction-stream 
reference in the EXC_ADDR IPR, flushes the pipeline of any instructions following the 
instruction which caused the ITB miss, waits for any other instructions which may be in 
progress to complete, enters PALmode, and jumps to the ITB miss PAL entry point. The 
recommended PALcode sequence for translating the address and filling the ITB is described 
below. 


1. Create some scratch area in the integer register file by writing the contents of a few 
integer registers to the PAL_TEMP register file. 


Read the target virtual address from the EXC_ADDR IPR. 


Fetch the PTE (this may take multiple reads) using a physical-mode HW_LD instruction. 
If this PTE’s valid bit is clear report TNV or ACV as appropriate. 


4, Since the ALPHA SRM states that translation buffers may not contain invalid PTEs, the 
PTE’s valid bit must be explicitly checked by PALcode. Further, since the ITB’s PTE RAM 
does not hold the FOE bit, the PALcode must also explicitly check this condition. If the 
PTE’s valid bit is set and FOE bit is clear, PALcode may fill an ITB entry. 


5. Write the original virtual address to the TB_TAG register using HW_MTPR. This writes 
the TAG into a temp register and not the actual tag field in the ITB. 


6. Write the PTE to the ITB_PTE register using HW_MTPR. This HW_MTPR causes both 
the TAG and PTE fields in the ITB to be written. Note it is not necessary to delay issuing 
the HW_MTPR to the ITB_PTE after the MTPR to the ITB_TAG is issued. 


7. Restore the contents of any modified integer registers to their original state Gene: the 
HW_MFPR instruction. 


8. Restart the instruction stream using the HW_REI instruction. 


3.7.2 DTB Miss 


When the Abox encounters a DTB miss it latches the referenced virtual address in the VA 
IPR and other information about the reference in the MMCSR IPR, and locks these registers 
against further modifications. The Ibox latches the PC of the instruction which generated the 
reference in the EXC_ADDR register, drains the machine as described above for ITB misses, 
and jumps to the DTB miss PALcode entry point. Unlike ITB misses, DTB misses may occur 
while the CPU is executing in PALmode. The recommended PALcode sequence for translating 
the address and filling the DTB is described below. 


1. Create some scratch area in the integer register file by writing the contents of a few 
integer registers to the PAL_TEMP register file. 
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2. Read the requested virtual address from the VA IPR. Although the act of reading this 
register unlocks the VA and MMCSR registers, the MMCSR register only updates when 
D-stream memory management errors occur. It therefore will retain information about 
the instruction which generated this DTB miss. This may be useful later. 


3. Fetch the PTE (may require multiple reads). If the valid bit of the PTE is clear, a TNV 
or ACV must be reported unless the instruction which caused the DTB miss was FETCH 3 
or FETCH/M. This can be checked via the opcode field of the MMCSR register. If the pos 
value in this field is 18 (hex), then a FETCH or FETCH/M instruction caused this DTB 
miss, and as mandated by the ALPHA SRM, the subsequent TNV or ACV should NOT be real ! 
reported. Therefore PALcode should read the value in EXC_ADDR, increment it by four, an : 
write this value back to EXC_ADDR, and do a HW_REI. "Vol 


4. Write the register which holds the contents of the PTE to the DTB_CTL IPR. This has the 
effect of selecting either the small or large page DTB for subsequent DTB fill operations, 
based on the value contained in the granularity hint field of the PTE. 


5. Write the original virtual address to the TB_TAG register. This writes the TAG into a 
temp register and not the actual tag field in the DTB 


6. Write the PTE to the DTB_PTE register. This HW_MTPR causes both the TAG and PTE 
fields in the DTB to be written. Note it is not necessary to delay issuing the HW_MTPR 
to the DTB_PTE after the MTPR to the DTB_TAG is issued. 


Restore the contents of any modified integer registers. 


Restart the instruction stream using the HW_REI instruction. 


3.8 Ibox IPRs 
3.8.1 TB_TAG 


The TB_TAG register is a write-only register which holds the tag for the next TB update 
operation in either the ITB or DTB. To insure the integrity of the TB, the tag is actually 
written to a temporary register and not transferred to the ITB or DTB until the ITB_PTE or 
DTB_PTE register is written. The entry to be written is chosen at the time of the ITB_PTE 
or DTB_PTE write operation by a not-last-used algorithm implemented in hardware. 


Writing the ITB_TAG register is only performed while in PALmode regardless of the state of 
the HWE bit in the ICCSR IPR. 


Small Page Format: 


a ee ee ee Dc ee ee ee ee ee Se Gee ews a8 es oD ee oe oe 
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GH = 11(bin) Format (DTB only): 


3.8.2 ITB_PTE 


The ITB PTE register is a read/write register representing the eight ITB page table entries. 
The entry to be written is chosen by a not-last-used algorithm implemented in hardware. 


Writes to the ITB_PTE use the memory format bit positions as described in the ALPHA SRM 
with the exception that some fields are ignored. 


To insure the integrity of the ITB, the ITB’s tag array is updated simultaneously from the 
internal tag register when the ITB_PTE register is written. Reads of the ITB_PTE require 
two instructions. First, a read from the ITB_PTE sends the PTE data to the ITB_PTE_TEMP 
register, then a second instruction reading from the ITB_PTE_TEMP register returns the PTE 
entry to the register file. Reading or writing the ITB_PTE register increments the TB entry 
pointer which allows reading the entire set of eight ITB PTE entries. 


Reading and writing the ITB_PTE register is only performed while in PALmode regardless 
of the state of the HWE bit in the ICCSR IPR. 


Write Format: 


6 5 5 3 3 1110000000000 
3 3 2 2-4 2109876543210 
fone pone pee eetee Ss phate tepe==5 = fafa aaa + 
| | | [U|S|E{K] [A | | 
| IGN |  £PFN[33..13] l IGN IR|IRIR|R| IGN |S| IGN | 
| | | [E|EJE|| 1M | | 
fe ener are Baar a et ee a f Sener enero eee pe eee oper emcee ea pp ereieere are + 

Read Format: 

6 3 3 1.0 2 00S 00. 0 000-0 
3 4 3 32109876543210 
‘eine ee ea [iar eae eee jai in oe eee + 
| |A| |U|S|E|K| | 
| RAZ iS{ PFN(33..13] |RIJRIRIRI RAZ | 
| |M| [E{E/E/E| | 
eee na ae eee ee ee ee jo papep pease ene eS + 


3.8.3 ICCSR 


The ICCSR register contains various Ibox hardware enables. The only architecturally defined 
bit in this register is the FPE, floating point enable, which enables floating point instruction 
execution. When clear, all floating point instructions generate FEN exceptions. This register 
is cleared by hardware at reset. The HWE bit allows the special PAL instructions to execute 
in kernel model. This bit is intended for diagnostics or operating system alternative PAL 
routines only. It does not allow access to the ITB registers while not running in PALmode. 


Therefore, some PALcode flows may require the PALmode environment to execute properly 
(e.g. ITB fill). 
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EV4 implements all of the ICCSR functionality described below. EV3 does not contain 
performance counters, a branch history table, or ASN support. It does, however, maintain 
register state for the performance counter control bits, the BHE bit, and the ASN field. These 
register bits may be read and written but otherwise do not affect any hardware function. 


Write Format: 


6 § 5 44 4444333333333 111000000000 
3 32 76 3210987654321 210987543210 
a a Se eae Sa Si ua Se a a i i mS a Sa 
| | IC |F|I{H|IDIBIJIBIV| Pc | | PC | | I|P | |P | 
| IGN |ASN{5:0]|[(5:2]}{PICIWII|H|S|P|/A|MUX1 | IGN | MUXO |IGNICICJIGNIC| 
| | [ [IE|LIE] |JEJEJE{X| [2:0] | | [3:0] | |O|90| {1 | 
| | | Boe AE) a ho ae 4 | | oe | | 
ee a a pe a patna tant at ep at o  t t tet et t A 
su 
Read Format: Yow 
6 3 33 PD: “29-979 OF 4.4 te aE OO 00 0. 0 60-0. br 
3 5 4 3 87 4321098765432109876543210 we 
fe a et nn a pn ant tata tate tata nt tt pet tot a\= } 
| | I | | IC |FII|HIDIB|J|IBIV] PC | Pe | |P|P|RI S 
l RAZ |ClASN[5:0]|(5:2]|P/C|IW|L|IHIS|/P{A|MUX1 | MUxO | RAZ (ercAl ? ye 
| 10 | | IE|LIE{ |EJE{E|X| [2:0]! [3:0] | 11|0{ 2] uf 
| | | | ke’ sls Pe aie “SE Uli als Mnf | | Le dat ete a 
pm a fp tp a peta pata ta tat ne pe ttt 


Table 3-9: ICCSR 


Field Type Description 

FPE RW,0 If set, floating point instructions can be issued. If clear,floating point 
instructions cause FEN exceptions. _ | 

HWE RW,0 If set allows the five PALRES instructions to be issued in kernel mode. 

DI | RW,0 If set enables dual issue. 

BHE RW,0 Used in conjunction with BPE. See table Table 3-10 for programming 
information. This bit is ignored in EV3. 

JSE RW,0 If set enables the JSR stack to push return addresses. 

BPE RW,0 _—Used in conjunction with BHE. See table Table 3-10 for programming 
information. | 7 

VAX RW,0 If clear causes all hardware interlocked instructions to drain the machine and 
waits for the write buffer to empty before issuing the next instruction. Examples 
of instructions that do not cause the pipe to drain include HW_MTPR, HW_REI, 
conditional branches, and instructions that have a destination register of R31. 

PCMUX1 RW,0 See table Table 3-12 for programming information. Performance counters are 


present only in EV4. 
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Table 3-9 (Cont.): ICCSR 


Field | Type Description 

PCMUX0 RW,0 See table Table 3-11 for programming information. Performance counters are 
present only in EV4. 

PC1 RW,0 If clear enables performance counter 1 interrupt request after 2**12 events 


PCO 


ASN 


IC 


counted. If set enables performance counter 1 interrupt request after 2**8 
events counted. 


RW,0 If clear enables performance counter 0 interrupt request after 2**16 events 
counted. If set enables Peronnance counter 0 interrupt request after 2**12 
events counted. 


RW,0 The Address Space Number field is used in conjunction with the Icache in 
EV4 to further qualify cache entries and avoid some cache flushes. The ASN 
is written to the Icache during fill operations and compared with the I-stream 
data on fetch operations. Mismatches invalidate the fetch without affecting the 
Icache. This function is only present in EV4. 


RW,0 The IC state bits are unused by hardware. 


Table 3—10: | BHE,BPE Branch Prediction Selection 


BPE BHE Prediction 
0 X Not Taken 
1 0 Sign of Displacement 


1 1 Branch History Table, (Not available in EV3) 


3.8.3.1 Performance Counters 


Performance counters are only available in EV4. They are reset to zero upon powerup, but 
are otherwise never cleared. They are intended as a means of counting events over a long 
period of time relative to the event frequency and therefore provide no means of extracting 
intermediate counter values. Since the counters continuously accumulate selected events 
despite interrupts being enabled, the first interrupt after selecting a new counter input has an 
error bound as large as the selected overflow range. In addition, some inputs may overcount 
events occurring simultaneously with D-stream errors which abort the actual event very 
late in the pipeline. For example, when counting load instructions, attempts to execute a 
load resulting in a DTB miss exception will increment the performance counter after the 
first aborted execution attempt and again after the TB fill routine when the load instruction 
reissues and completes. 


Performance counter interrupts are reported six cycles after the event that caused the counter 
to overflow. Additional delay may occur before an interrupt is serviced if the processor is 
executing PALcode which always disables interrupts. In either case, events occurring during 
the interval between counter overflow and interrupt service are counted toward the next 
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interrupt. Only in the case of a complete counter wraparound while interrupts are disabled 
will an interrupt be missed. 


The six cycles before an interrupt is triggered implies that a maximum of 12 instructions may 
have completed before the start of the interrupt service routine. In most cases, by examining 
the possible intervening instructions and the issue rules presented in section 2.9, it is possible 
to further isolate trigger events. Two cases always provide a more accurate exception PC. 
When counting Icache misses, no intervening instructions can complete and the exception PC 
contains the address of the last Icache miss. Branch mispredictions allow a maximum of only 


2 instructions to complete before start of the interrupt service routine. 


Table 3-11: Performance Counter 0 Input Selection 

MUX0[3:0] Input Comment 

000X Total Issues / 2 Counts total issues divided by 2, e.g dual issue increments 
count by 1 

001X Pipeline Dry Counts cycles where nothing issued due to lack of valid I- 
stream data. Causes include Icache fill, misprediction, branch 
delay slots and pipeline drain for exception 

010X Load Instructions Counts all Load instructions 

011X Pipeline Frozen Counts cycles where nothing issued due to resource conflict. 
Refer to section 2.9 for information regarding scheduling and 
issue rules. 

100X Branch Instruc- Counts all Branch instructions, conditional, unconditional, any 

tions JSR, HW_REI 

1010 PALmode Counts cycles while executing in PAL mode 

1011 Total cycles Counts total cycles 

110X Total Non-issues /2 Counts total non_issues divided by 2, e.g no issue increments 
count by 1 

111X PERF_CNT_H<0> Counts external event supplied by pin at selected system clock 


cycle interval 
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Table 3-12: Performance Counter 1 Input Selection 


MUX1[2:0] Input Comment 

000 Deache miss Counts total Deache misses 

001 Icache miss | Counts total Icache misses 

010 Dual issue Counts cycles of Dual issue 

O11 Branch Mispredicts Counts both conditional branch mispredictions and JSR or 


HW_REI mispredictions. Conditional branch mispredictions 
cost 4 cycles and others cost 5 cycles of dry pipeline delay. 


100 FP Instructions Counts total floating point operate instructions, i.e no FP 
branch, load, store | 

101 Integer Operate Counts integer operate instructions including LDA,LDAH with 
destination other than R31 

110 Store Instructions Counts total store instructions 

111 PERF_CNT_H<1> Counts external event supplied by pin at selected system clock 


cycle interval 


3.8.4 ITB_PTE_TEMP 


The ITB_PTE_TEMP register is a read-only holding register for ITB_PTE read data. Reads 
of the ITB_PTE require two instructions to return the data to the register file. The first 
reads the ITB_PTE register to the ITB_PTE_TEMP register. The second returns the ITB_ 
PTE_TEMP register to the integer register file. The ITB_PTE_TEMP register is updated on 
all ITB accesses, both read and write. A read of the ITB_PTE to the ITB_PTE_TEMP should 
be followed closely by a read of the ITB_PTE_TEMP to the register file. 


Reading the ITB_PTE_TEMP register is only performed while in PALmode regardless of the 
state of the HWE bit in the ICCSR IPR. 


3. 3°38 111100 0 
3 5 4 3 3210 9 8 0) 
ooo ee a + 
| [Al , |U|S|E|K| | 
| — RAZ [S| PFN[33..13] {[R|JR{R|R|. RAZ 
| | M| [EJ|E|E|E| | 


$--------------------- +—+-—-------------- $—$—+-+—4+-—--=---- + 


3.8.5 EXC_ADDR 


The EXC_ADDR register is a read/write register used to restart the machine after exceptions 
or interrupts. It is written by hardware with the PC of the excepting instruction, or the 
currently executing instruction at the time of an interrupt or trap. The instruction pointed 
to by the EXC_ADDR register did not complete execution. The EXC_ADDR register can also 
be read and written directly by PALcode. The HW_REI instruction executes a jump to the 
address contained in the EXC_ADDR register. Since the PC must be longword aligned, the 
Isb of the EXC_ADDR register is used to indicate PALmode to the hardware. 
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Note that bit[1] is undefined when the EXC_ADDR is read. The actual hardware ignores this 


bit, however PALcode must explicitly clear this bit before it pushes the exception address on 
the stack. 


EV3 requires that the EXC_ADDR register be written before executing a HW_REI. This 
restriction applies because the register may not be sign extended despite a read of the same 
register indicating so. This restriction does not apply for the EV4 chip. 


IPR Format: 

6 000 
3 2 1. 0 
ee ee ee a a ee ee en ee re repee oes 
| |} I|P 
| PC[63..2] |G|A| 
| INL] 
A a ak am a Se ts al a le ee ee wens 


3.8.6 SL_CLR 


This write-only register clears the serial line interrupt request, the performance counter 
interrupt request and the CRD interrupt request. EV3 does not contain performance counters 
and cannot initiate CRD interrupt requests. Therefore, the write of any data to the SL_CLR 
register will clear the remaining serial line interrupt request. EV4 requires that the indicated 
bit be written with a zero to clear the selected interrupt source. 


6 3 3.3 Tiiizriiirrtaoowdddaddad =—d 
3 SZ. 654321098765 4321 «+0 
fae en = = +-+---~-------- $a$——-- +—p-------=- +—+——=-+ 
| [S| | |P | |P | |C| | 
| IGN |L] IGN |C | IGN |C| IGN |R| IGN | 
| . | C| | 0] | 1| |D| | 
$---—------------- +-+------------ Se ee +—t--------- +—+——=+ 

Table 3-13: SL CLR 

Field Type Description 

CRD Wwoc Clears the correctable read error interrupt request. 
PC1 WOC Clears the performance counter 1 interrupt request. 
PCO WoC Clears the performance counter 0 interrupt request. 
SLC WoC Clears the serial line interrupt request. 


3.8.7 SL_RCV 


The serial line receive register contains a single read-only bit used with the interrupt control 
registers and the sRomD_h and sRomClk_h pins to provide an on-chip serial line function. 
The RCV bit is functionally connected to the sRomD_h pin after the Icache is loaded from | 
the external serial ROM. Reading the RCV bit can be used to receive external data one bit 
at a time under a software timing loop. A serial line interrupt is requested on detection of 
any transition on the receive line which sets the SL_REQ bit in the HIRR. Using a software 
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timing loop, the RCV bit can be read to receive data one bit at a time. The serial line interrupt 
can be disabled by clearing the HIER register SL_ENA bit. 


EV4 IPR 

6 00 0 0 
3 432 0 
$e nn ee a nn en ee = $—$---—= + 
| |R{ | 
| RAZ {C] RAZ | 
| IV} | 
pe i en a en en en ee ee $—po-——— + 
EV3 IPR 

6 00 0 0 
3 5 4 3 @) 
fen n ncn  n  - - = - +—t—------ + 
| [R| | 
| RAZ Ic] RAZ | 
| IV] | 
fone + = = + $—t-—---=- + 


3.8.8 ITBZAP 


A write of any value to this IPR invalidates all eight ITB entries. It also resets the NLU 
pointer to its initial state. The ITBZAP register should only be written in PAL mode. 


3.8.9 ITBASM 


A write of any value to this IPR invalidates all ITB entries in which the ASM bit is equal to 
zero. The ITBASM register should only be written in PAL mode. 


3.8.10 ITBIS 


A write of any value to this IPR invalidates all eight ITB entries. It also resets the NLU 
pointer to its initial state. The ITBIS register should only be written in PAL mode. 


3.8.11 PS 


The processor status register is a read/write register containing only the current mode bits 
of the architecturally defined PS. 


Write Format: 


000 0 0 
| 9 43 2 0 
fen nnn - = = = = = ee + 
| |C}C| | 
| IGN {M{|M{ IGN | 
| | }1|0| | 
$+--~---------~-------- ------ - = -- 5 == + - = ++ -- === $—t—+----= + 


Read Format: 


6 33 3 000 
3 5 4 3 2° 2.0 
frome nee -- - - - +—+——---—------ - -- - +--+ - +—+—+ 
| [C| IC|R| 
| RAZ |M| RAZ [MIA] 
| |. 1O|Z| bab -- 
fon - - - - - - - $—+—--------------------- ----- +—+—+ 


3.8.12 EXC_SUM 


The exception summary register records the various types of arithmetic traps that have 
occurred since the last time the EXC_SUM was written (cleared). When the result of an 
arithmetic operation produces an arithmetic trap, the corresponding EXC_SUM bit is set. 


In addition, the register containing the result of that operation is recorded in the exception 
register write mask IPR, as a single bit in a 64-bit field specifying registers F31-F0O and 131-I0. 
This IPR is visible only through the EXC_SUM register. The EXC_SUM register provides a 
one-bit window to the exception register write mask. Each read to the EXC_SUM shifts one 
bit in order F31-F0 then I31-I10. The read also clears the corresponding bit. Therefore, the 
EXC_SUM must be read 64 times to extract the complete mask and clear the entire register. 


Any write to EXC_SUM clears bits [8..2] and does not affect the write mask. 


The write mask register bit clears three cycles after a read. Therefore, code intended to 
read the register must allow at least three cycles between reads to allow the clear and shift 
operation to complete in order to insure reading successive bits. 


6 33°33 

5 43 2 987654321 +0 
$e ~~ + $e pone $atap—t—+—+-4+—+--=+ 
| |M| [I|IIUJFID|II|S| R | 
| RAZ }S| RAZ LO|IN{INIOIZ|N|WI| A | 
| [K| IVIEIFIVIE|VICI 2 | 
panne pope -- == $ot—t—taf-—t—+—f-— Ht 
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Table 3-14: EXC SUM 


Field Type 
SWC WA 
INV WA 
DZE WA 
FOV WA 
UNF WA 
INE WA 
IOV WA 
MSK RC 


Description 


Indicates Software Completion possible. The bit is set after a floating point 
instruction containing the /S modifier completes with an arithmetic trap and all 
previous floating point instructions that trapped since the last MTPR EXC_SUM 
also contained the /S modifier. The SWC bit is cleared whenever a floating point 
instruction without the /S modifier completes with an arithmetic trap. The bit 
remains cleared regardless of additional arithmetic traps until the register is written 
via an MTPR instruction. The bit is always cleared upon any MTPR write to the 
EXC_SUM register. 


Indicates Invalid Operation. 

Indicates Divide by Zero. 

Indicates Floating Point Overflow. 

Indicates Floating Point Underflow. 

Indicates Floating Inexact Error. 

Indicates Fbox Convert to Integer Overflow or Integer Arithmetic Overflow. 
Exception Register Write Mask IPR Window. 


3.8.13 PAL_BASE 


The PAL base register is a read/write register containing the base address for PALcode. This 
register is cleared by hardware at reset. 


PAL base register format: 
6 3 3 ti. <0 
3 4 3 43 0 
(pene en +--+ poo ---- +--+ -- = == $a---- + 
| | | IGN | 
| IGN/RAZ | PAL BASE[33..14] i Fe 
| | | RAZ | 
$o-------+----~------------- fon-------- +--+ +-+--- + 


3.8.14 HIRR 


The Hardware Interrupt Request. Register is a read-only register providing a record of all 
currently outstanding interrupt requests and summary bits at the time of the read. For each 
bit of the HIRR [5:0] there is a corresponding bit of the HIER (Hardware Interrupt Enable 
Register) that must be set to request an interrupt. In addition to returning the status of the 
hardware interrupt requests, a read of the HIRR returns the state of the software interrupt 
and AST requests. Note that a read of the HIRR may return a value of zero if the hardware 
interrupt was released before the read (passive release). The register guarantees that the 


HWR bit reflects the status as shown by the HIRR bits. All interrupt requests are blocked 
while executing in PALmode. 
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Read Format: 





6 oes. 22 11il1 0000 000000 
3 2: Z 9 8 432 09 8 7 5 4321 0 
fn en 4 ef a te nt ep ttt tt 
| IU SE Kl |S] |P|P| |CIA|S|HIR| 
| RAZ | ASTRR | SIRR |L| HIRR |C{C] HIRR |RITIW|WIA| 
| | {[3..0]] [15..1]{R][2..0] /O]/1][5..3] |RIRIRIRIZ| 
fa a a pp en nt mp ee tant nnpttatat—t 
Table 3-15: HIRR aonb) 
Field Type Description 
HWR RO Is set if any hardware interrupt request and corresponding enable is set 
SWR RO Is set if any software interrupt request and corresponding enable is set 
ATR RO Is set if any AST request and corresponding enable is set. This bit also 
requires that the processor mode be equal to or higher than the request 
mode. In EV4 chips, a further requirement is that SIER[2] must be set to 
allow AST interrupt requests. Ke 
HIRR{5..0] RO Corresponds to pins Irq_h[5..0]. | i og Pero 
SIRR[15..1] RO Corresponds to software interrupt request 15 thru 
ASTRR{3..0] RO Corresponds to AST request three 1 zero (USEK). 
PC1 RO Performance counter 1 interrupt request. Performance counters are only 
present in EV4. 
PCO RO Performance counter 0 interrupt request. Performance counters are only 
present in EV4. 
SLR RO Serial line interrupt request. 
CRR RO CRD correctable read error interrupt request. This bit is only presentin EV4 SL- CLE 


chips and read as zero in EV3. 


3.8.15 SIRR 


The Software Interrupt Request Register is.a read/write register used to control software 


interrupt requests. For each bit of the SIRR there is a corresponding bit of the SIER (Software 


Interrupt Enable Register) that must be set to request an interrupt. Reads of the SIRR return 
the complete set of interrupt request registers and summary bits, see the HIRR Table 3—15 
for details. All interrupt requests are blocked while executing in PALmode. 


Write Format: 


en OD ee ee Gee Oe ee ee ee oe 
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Read Format: 


6 33 22 111 0000 00000 0 
3 ees 9 8 432 098 7 543210 
fa an pn pn penn nn tata panne ata tatatatat 
| |U SE RI |S| |P|P| [|C/A|S|H|R| 
| RAZ {| ASTRR | SIRR |L| HIRR [C|C] HIRR |RITIW|WIA| 
| | [3..O0]] [15..1][R[[2..0] JO|/1} [5..3] [RIR{RIR|Z| 
$e en fn fn tet i en tan titan tatitat-t-+ 


3.8.16 ASTRR 


The Asynchronous Trap Request Register is a read/write register. It contains bits to request 
AST interrupts in each of the processor modes. In order to generate an AST interrupt, the 
corresponding enable bit in the ASTER must be set and the processor must be in the selected 
processor mode or higher privilege as described by the current value of the PS CM bits. 
In addition, AST interrupts are only enabled in EV4 if the SIER([2] is set. This provides a 
mechanism to lock out AST requests over certain IPL levels. In EV3, this function is provided 
in PAL code. All interrupt requests are blocked while executing in PALmode. Reads of the 


ASTRR return the complete set of interrupt request registers and summary bits, see the 
HIRR Table 3-15 for details. 


Write Format: 


6 555444 0 
3 AO O58 0 
$———-=-- == a ee oo + 
| fUIS{E|K| | 
IGN |A|AIAIA| IGN l 
| [RJR|IRIRI | 
$--—------ fm papapata nnn n nn nn on + 
Read Format: | 
6 3 3 2 2 seg ee 0000 000000 
3 32 9 8 432 0987 543210 
fv a an nn fn np enn pa pope nnn tat atatatat 
| {USE K| |S | [P|P| [C{A|S|H{R| 
| RAZ | ASTRR | SIRR |L| HIRR |C/C| HIRR |RITI|WIWIA! 
| | [3..0]] [15..1]|]R][2..0]]0]1/[5..3]]R]R|]R/RIZ| 
pe a en nf fn pn a an pe pe patti at 


3.8.17 HIER 


The Hardware Interrupt Enable Register is a read/write register. It is used to enable - 
corresponding bits of the HIRR requesting interrupt. The PCO, PC1, SLE and CRE bits 
of this register enable the performance counters, serial line and correctable read interrupts. 
There is a one-to-one correspondence between the interrupt requests and enable bits, as with 
the reads of the interrupt request IPRs, reads of the HIER return the complete set of interrupt 
enable registers, see the HIRR Table 3—15 for details. 


Since the CRD interrupt request is not supported in EV3, the CRE bit is not present in the 
EV3 register. It is ignored on writes and read back as zero. 
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Write Format: 


3 3 3 111 000 0 0 

3 aU. <b 65 4 98 7 2 0 
(rican cece Sumter Sl ieee Sonne Shieh ea ated het eas ae eer apie 
| |S| |P | |P | |C| | 
| IGN |L| IGN |C| HIER[5..0}] |C] IGN |R| | 
| |E| }1| |O| |B | | 
fae Sone eae ea a eee ana aa eee a ee x aes eee en ee Xoo eee 
Read Format: 

6 333322 1. a 100 0 00 0Q 0 

3 32109 8 432 098g: 7 5 4 3 0 
Poa EeS ae Lototatat waa oa cen ane ee nee eee one $otcasloS + 

| |U|S{E{R| [S| [P|P| {CI | 

l RAZ [AJAJA|A| SIER[15..1] |L|] HIER |C{C| HIER |R| RAZ | 

| [E{E|B|E| [E|(2..0){/0]1](5..3] JE | 
op Sete Semin ie pet ope pot eect ae oS peer eee eee ea eke Senne ee See eee + 


3.8.18 SIER 


The Software Interrupt Enable Register is a read/write register. It is used to enable 
corresponding bits of the SIRR requesting interrupts. There is a one-to-one correspondence 
between the interrupt requests and enable bits, as with the reads of the interrupt request 


IPRs, reads of the SIER return the complete set of interrupt enable registers, see the HIRR 
Table 3—15 for details. 


The CRE bit is only supported in EV4. Reads of this register will always return zero on the 
CRE bit in EV3. 


Write Format: 


Read Format: 


NO Ww 
RW 
Oo W 
wo N 
oO N 


HN 
4 
+—+—+-4+-+------------- 
|U|S|E|K] 

JAJAJA|A| STER[15..1] 
JEJE|EIE| 
$at—t-+-+------------- 


0 
0 
sn Sm a A se a eG a es ol a a at mn + 
| 
IGN | 
| 
oe a ae NS cana AD GND GD A GES SRE AE GIO SN aD SOD ND GD GEE SETH RD ND ME nD eS + 
11 100 0 0 0 0 0 
BtZ 098 7 5 4 3 re) 
+-+------ +-+—+------ +—p—-—---= + 
S| [Pip |C| | 


{L| HIER |CjC{] HIER |R| RAZ | 
[JE] (2..0]/O0j{11 [5..3] |B | 
+—$---—-- +-+-+--—--- $-+------- + 
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3.8.19 ASTER 


The AST Interrupt Enable Register is a read/write register. It is used to enable corresponding 
bits of the ASTRR requesting interrupts. There is a one-to-one correspondence between the 
interrupt requests and enable bits, as with the reads of the interrupt request IPRs, reads of 
the ASTER return the complete set of interrupt enable registers, see the HIRR Table 3-15 


for details. 


The CRE bit is only supported in EV4. Reads of this register will always return zero on the 


CRE bit in EVS. 


Write Format: 


6 555444 0 
3 210987 0 
sities pei wiad aes eee een ooeee ae eal + 
| |U|S|E|KI | 
| IGN |AJAJAIA| IGN | 
| [EJEIEJE | 
SS alors haa ket ec Enc a ep no ce te ca cele ae Ge ea nn Sve eee + 

Read Format: 

6 333322 ee 1000 000 0 
3 321098 432 0987 5 43 0 
foe -- = ee ee eee ee ee ee ee fopaen ane oo ee ee papa + 
| {U[S|E|K| |S | [|P{P| |C] | 
| RAZ [AJAJA|A| SIER[15..1] |L| HIER |C{C] HIER |R| RAZ | 
| [E|EJE|E| [IB{(2..O0]}/O[1|(5..3] |E| | 
Se ieee tus agate ae eee ee ee ee eee a ree ep aernaanses es eee a hetitssome + 


3.8.20 SL_XMIT 


The serial line transmit register contains a single write-only bit used with the interrupt 
control registers and the sRomD_h and sRomClk_h pins to provide an on-chip serial line 
function. The TMT bit is functionally connected to the sRomClk_h pin after the Icache is 
loaded from the external serial ROM. Writing the TMT bit can be used to transmit data off 


chip one bit at a time under a software timing loop. 


OY 


00 0 e) 

3 5 4 3 0 
fr ea ee ee en ee a ee $ap—na na + 
| | TI | 
| IGN [M{| IGN | 
| |T| | 
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3.9 Abox IPRs 
3.9.1 DTB_CTL 


The large-page-select (GH=11(bin)) field selects between the EVx small-page and large-page 
DTBs for DTB fills. If GH=11(bin) then the large page DTB is chosen for DTB_PTE writes 
and reads. If GH is anything else then the small page DTB is chosen for DTB_PTE writes 
and reads. The GH field is write only. 


6 0 00 0 0 
3 7 65 4 0 
$a------ $5 = = = $--}$--------- + 
| al | 
| IGN | GH | IGN | 
| | | | 
$oon no -- 5 ee e+ +—-}$--------- + 


3.9.2 DTB PTE 


The DTB PTE register is a read/write register representing the 32-entry small-page and 
4-entry large-page DTB page table entries. The entry to be written is chosen by a not-last- 
used algorithm implemented in hardware and the value in the DTB_CTL register. Writes to 
the DTB_PTE use the memory format bit positions as described in the ALPHA SRM with 


the exception that some fields are ignored. In particular the valid bit is not represented in 
hardware. | : 


To insure the integrity of the DTBs, the DTB’s tag array is updated simultaneously from the 
internal tag register when the DTB_PTE register is written. Reads of the DTB_PTE require 
two instructions. First, a read from the DITB_PTE sends the PTE data to the DTB_PTE_ 
TEMP register, then a second instruction reading from the DTB_PTE_TEMP register returns 
the PTE entry to the register file. Reading or writing the DTB_PTE register increments the 
TB entry pointer of the DTB indicated by the DTB_CTL IPR which allows reading the entire 
set of DTB PTE entries. 


Small Page Format: © 
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Large Page Format: 


6 5 5 4 4 Litliilidz?odo00dd00 0 
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3.9.3 DTB_PTE_TEMP 


The DTB_PTE_TEMP register is a read-only holding register for DTB_PTE read data. Reads 
of the DTB_PTE require two instructions to return the data to the register file. The first 


reads the DTB_PTE register to the DT[B_PTE_TEMP register. The second returns the DTB_ 
PTE_TEMP register to the integer register file. 


Small Page Format: 


6 333 1Tiii0o0oe0o0g0d0o0 0 00 
3 5 4 3 321098765 4 3 20 
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Large Page Format: 
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3.9.4 MM_CSR 


When D-stream faults occur the information about the fault is latched and saved in the MM_ 
CSR register. The VA and MMCSR registers are locked against further updates until software 
reads the VA register. Palcode must explicitly unlock this register whenever its entry point 
was higher in priority than a DTB miss. MM_CSR bits are only modified by hardware when 


the register is not locked and a memory management error or a DTB miss occurs. The MM_ 
CSR is unlocked after reset. | 


6 11 0 0 00000 

3 5 4 9 8 43210 
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Table 3-16: MM_CSR 


Field Type Description 
WR RO Set if reference which caused error was a write. 
ACV RO Set if reference caused an access violation. 
FOR RO Set if reference was a read and the PTE’s FOR bit was set. 
FOW RO Set if reference was a write and the PTE’s FOW bit was set. 
RA RO Ra field of the faulting instruction. 
OPCODE RO Opcode field of the faulting instruction. 
3.9.5 VA 


When D-stream faults or DTB misses occur the effective virtual address associated with the 
fault or miss is latched in the read-only VA register. The VA and MMCSR registers are locked 
against further updates until software reads the VA register. The VA IPR is unlocked after 
reset. Palcode must explicitly unlock this register whenever its entry point was higher in 
priority than a DTB miss. 


3.9.6 DTBZAP 


A write of any value to this IPR invalidates all 32 small-page and four large-page DTB entries. 
It also resets the NLU pointer to its initial state. 


3.9.7 DTBASM 


A write of any value to this IPR invalidates all 32 small-page and 4 large-page DTB entries 
in which the ASM bit is equal to zero. 


3.9.8 DTBIS 


If the virtual address in the RB field is mapped in either the small-page or large-page DTB 
then those entries are invalidated. 


3.9.9 FLUSH_IC 
A write of any value to this pseudo-IPR flushes the entire instruction cache. 


3.9.10 FLUSH_IC_ASM 


In EV4, a write of any value to this pseudo-IPR invalidates all Icache blocks in which the 
ASM bit is clear. In EV3, a write to this pseudo-register is equivalent to a NOP. 
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3.9.11 ABOX CTL 


63 


12 


it 10: “9° 8 “7 © <S: 4 3 2 4 0 
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| 
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Table 3-17: Abox Control Register 


Field 
WB_DIS 


MCHK_EN 


CRD_EN - EV4 only 


IC_SBUF_EN - EV4 
only 


DC_EN 


DC_FHIT 


Type 
WO,0 


WO,0 


WO,0 


WO,0 


WO,0 


WO,0 


Description 


Write Buffer unload Disable. When set, this bit prevents the write 
buffer from sending write data to the BIU. It should be set for 
diagnostics only. 


Machine Check Enable. When this bit is set the Abox generates 

a machine check when errors which are not correctable by the 
hardware are encountered. When this bit is cleared, uncorrectable 
errors do not cause a machine check, but the BIU_LSTAT, DC_STAT, 
BIU_ADDR, FILL_ADDR and DC_ADDR registers are updated and 
locked when the errors occur. 


Corrected read data interrupt enable. When this bit is set the Abox 
generates an interrupt request whenever a pin bus transaction is 
terminated with a cAck_h code of SOFT_ERROR. 


Icache stream buffer enable. When set, this bit enables operation of 
a single.entry Icache stream buffer. 


Deache enable. When clear, this bit disables and flushes the Dcache. 
When set, this bit enables the Deache. 


Deache force hit. When set, this bit forces all D-stream references to 
hit in the Deache. This bit takes precedence over DC_EN, i.e. when 
DC_FHIT is set and DC_EN is clear all D-stream references hit in 
the Deache. | 


3-34 Privileged Architecture Library Code 


3.9.12 ALT_MODE 


ALT_MODBE is a write-only IPR. The AM field specifies the alternate processor mode used by 
HW_LD and HW_ST instructions which have their ALT bit (bit 14) set. 


0 00 0 0 

3 5 43 2 0 
fone a n= $o-+-------- + 
| JA | | 
| IGN \M | IGN | 
| | | | 
foe —-~ 5 5 + 5 5+ a + 


Table 3-18: ALT Mode 


ALT_ 
MODE([4..3] Mode 
00 Kernel 
01 Executive 
10 Supervisor 
11 User 
3.9.13 CC 


EVx supports a cycle counter as described in the ALPHA SRM. This counter, when enabled, 
increments once each CPU cycle. HW_MTPR Rn,CC writes CC[63..32] with the value held 


in Rn([63..32], and CC[31..0] are not changed. This register is read by the RCC instruction 
defined in the ALPHA SRM. 


3.9.14 CC_CTL 
HW_MTPR Rn,CC_CTL writes CC[31..0] with the value held in Rn[31..0], and CC[63..32] are 


not changed. CC[3..0] must be written with zero. If Rn[32] is set then the counter is enabled, 
otherwise the counter is disabled. CC_CTL is a write-only IPR. 
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3.9.15 BIU_CTL 


S. 2 Ss. 3.3 
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Table 3-19: BIU Control Register 


Field 
BC_EN 


ECC 


OE 


BC_FHIT 


BC_RD_SPD 


Type 
WoO,0 


WoO 


Wo,0 


WoO,0 


Wo 


Description 


BC_ENA 
ECC 


BC_FHIT 
BC SIZE 
BAD TCP 
BC _PA DIS 
BAD DP 


External cache enable. When clear, this bit disables the external cache. 
When the external cache is disabled the BIU does not probe the external 
cache tag store for read and write references; it launches a request on cReq_h 


immediately. 


When this bit is set EVx generates/expects ECC on the check_h pins. When 
this bit is clear EVx generates/expects parity on four of the check_h pins. 


When this bit is set EVx does not assert its chip enable pins during RAM 
write cycles, thus enabling these pins to be connected to the output enable 


pins of the cache RAMs. 


External cache force hit. When this bit is set and BC_EN is also set, all pin 
bus READ_BLOCK and WRITE_BLOCK transactions are forced to hit in 
the external cache. Tag and tag control parity are ignored when the BIU 
operates in this mode. BC_EN takes precedence over BC_FHIT: When BC_ 
EN is clear and BC_FHIT is set no tag probes occur and external requests 


are directed to the cReq_h pins. 


Note that the BC_PA_ DIS field takes precedence over the BC_FHIT bit. 


External cache read speed. This field indicates to the BIU the read access 
time of the RAMs used to implement the off-chip external cache, measured 
in CPU cycles. It should be written with a value equal to one less the read 


access time of the external cache RAMs. 


Access times for reads must be in the range 16..3 CPU cycles, which means 


the values for the BC_RD_SPD field are in the range of 15..2. 


BC_RD_SPD are not initialized on reset and must be explicitly written before 


enabling the external cache. 
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oe 


Table 3—19 (Cont.): BIU Control Register 


BC_WR_SPD WO 


BC_WE_CTL WoO 


BC_SIZE Wo 


BAD_TCP -EV4 WO,0 
only 


BC_PA_DIS WO 


BAD_DP-EV4 WO 
only 


Field Type ipti 


Description 


External cache write speed. This field indicates to the BIU the write cycle 
time of the RAMs used to implement the off-chip external cache, measured 
in CPU cycles. It should be written with a value equal to one less the write 
cycle time of the external cache RAMs. 


Access times for writes must be in the range 16..2 CPU cycles, which means 
the values for the BC_RD_SPD field are in the range of 15..1. 


BC_WR_SPD are not initialized on reset and must be explicitly written 
before enabling the external cache. 


External cache write enable control. This field is used to control the timing 
of the write enable and chip enable pins during writes into the data and tag 
control RAMs. It consists of 15 bits, where each bit determines the value 
placed on the write enable and chip enable pins during a given CPU cycle _ 
of the RAM write access. When a given bit of BC_WE_CTL is set, the write 
enable and chip enable pins are asserted during the corresponding CPU cycle 
of the RAM access. BC_WE_CTL[0] (bit 13 in BIU_CTL) corresponds to the 
second cycle of the write access, BC_WE_CTL[1] (bit 14 in BIU_CTL) to the 
third CPU cycle, and so on. The write enable pins will never be asserted in 
the first CPU cycle of a RAM write access. 


Unused bits in the BC_WE_CTL field must be written with zeros. 


BC_WE_CTL is not initialized on reset and must be explicitly written before 
enabling the external cache. 


This field is used to indicate the size of the external cache. BC_SIZE is 
not initialized on reset and must be explicitly written before enabling the 
external cache. See Table 3-20 for the encodings. 


When set, BAD TCP causes EV4 to write bad parity into the tag control 
RAM whenever it does a fast external RAM write. 


This 4-bit field may be used to prevent the CPU chip from using the external 
cache to service reads and writes based upon the quadrant of physical 
address space which they reference. The correspondence between this bit 
field and the physical address space is shown in Table 3—21. 


When a read or write reference is presented to the BIU the values of BC_PA_ 
DIS, BC_ENA and physical address bits [33:32] together determine whether 
to attempt to use the external cache to satisfy the reference. If the external 
cache is not to be used for a given reference the BIU does not probe the tag 
store, and makes the appropriate system request immediately. The value of 
BC_PA_DIS has NO impact on which portions of the physical address space 
may be cached in the primary caches. System components control this via 
the RDACK field of the pin bus. 


BC_PA_DIS are not initialized by reset. 
When set, BAD_DP causes EV4 to invert the value placed on bits [0],[7],[14] 


and [21] of the check_h[27..0] field during off-chip writes. This produces bad 


parity when E'V4 is in parity mode, and bad check bit codes when EV4 is in 
ECC mode. 
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Table 3-20: BC_ SIZE 


BC_SIZE _ Size 

000 128 Kbytes 
001 256 Kbytes 
010 512 Kbytes 

011 1 Mbytes 
100 2 Mbytes 
101 4 Mbytes _ 
110 8 Mbytes 


Table 3-21: BC_PA_DIS | 
BIU_CTL bits Physical Address 


[32] PA[33..32] = 
[33] PA[33..32] = 
[34] PA[33..32] = 
[35] PA[33..32] = 


3.10 PAL_TEMPs 


The CPU chip contains 32 registers which are accessible via HW_MXPR instructions. These 
registers provide temporary storage for PALcode. 
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3.10.1 DC_STAT 
The DC_STAT is a read-only IPR. 


Overview: 


When an external ECC or parity error is recognized during a primary cache fill operation, 
the DC_STAT register is locked against further updates. In the event that the cache fill was 
due to D-stream activity the contents of this register may be used by PAL code in conjunction 
with information latched elsewhere (see Section 3.12) to recover from some single-bit ECC 
errors. DC_STAT is unlocked when DC_ADDR is read. 


63 15 1413 12 1110 9 8 4 3 2 0 
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eee > LOCK 
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Table 3-22: Dcache Status Register 
Field Type Description 
DC_HIT RO This bit indicates whether the last load or store instruction processed by the 


Abox hit, (DC_HIT set) or missed, (DC_HIT clear) the Deache. In EV4, loads 
that miss the Dcache may be completed without requiring external reads. 
e.g. pending fill or pending store hits. 


SEO RO Second Error Occurred. Set when an error which would normally lock the 
DC_STAT register occurs while the DC_STAT register is already locked. 


The following bits are only meaningful if the FILL_ECC or FILL_DPERR bit in the BIU_ 
STAT register is set. 
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Table 3-23: 


Field 
RA 

INT 

LW 
VAX_F'P 


LOCK 


STORE 


Deache STAT Error Modifiers 


Type 
RO 
RO 
RO 
RO 


RO 


RO 


3.10.2 DC_ADDR 


In EV3, this is a read-only register which contains bits [33..2] of the physical address 
generated by the load instruction associated with errors reported by the FILL_ECC or FILL_ 
DPERR bits in the BIU_STAT register. 


In EV4, this is a pseudo-register used for unlocking DC_STAT. 
In both EV3 and EV4, DC_STAT and DC_ADDR are unlocked when DC_ADDR is read. 


Description 

The Ra field of the instruction which resulted in the error. 

When set, indicates an integer load or store. 

When set, indicates that the data length of the load or store was longword. 


When INT is clear, this bit is set to indicate that a VAX floating point format 
load or store caused the error. , 


This bit is set to indicate that the error stemmed from a LDLL, LDQL, STLC, 
or STQC instruction. 


This bit is set to indicate that the error stemmed from a store instruction. 


3-40 Privileged Architecture Library Code 





3.10.3 BIU_STAT 


BIU_STAT is a read-only IPR. 


When one of BIULHERR, BIU_SERR, BC_TPERR or BC_TCPERR is set, BIU_STATT{6..0] 
are locked against further updates, and the address associated with the error is latched and 
locked in the BIU_ADDR register. BIU_STAT'6..0] and BIU_ADDR are also spuriously locked 
when FILL_ECC or BIU_DPERR is set. BIU_STAT[7..0] and BIU_ADDR are unlocked when 
the BIU_ADDR register is read. 


When FILL_ECC or BIU_DPERR is set, BIU_STAT[13..8] are locked against further updates, 
and the address associated with the error is latched and locked in the FILL_ADDR register. 
BIU_STAT[14..8] and FILL_ADDR are unlocked when the FILL_ADDR register is read. 


This register is not unlocked or cleared by reset and needs to be explicitly cleared by PALcode. 
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Table 3-24: BIU STAT | 
Field Type Description 
BIU_HERR RO This bit, when set, indicates that an external cycle was terminated with the 
cAck_h pins indicating HARD_ERROR. 
BIU_SERR RO This bit, when set, indicates that an external cycle was terminated with the 
cAck_h pins indicating SOFT_ERROR. 
BC_TPERR RO This bit, when set, indicates that a external cache tag probe encountered bad 
parity in the tag address RAM. 
BC_TCPERR RO This bit, when set, indicates that a external cache tag probe encountered bad 
parity in the tag control RAM. 
BIU_CMD RO This field latches the cycle type on the cReq_h pins when a BIU_HERR, 


BIU_SERR, BC_TPERR, or BC_TCPERR error occurs. 
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Table 3-24 (Cont.): BIU STAT 
Field Type _ Description 


BIU_SEO RO This bit, when set, indicates that an external cycle was terminated with the 


cAck_h pins indicating HARD_ERROR or that a an external cache tag probe 
encountered bad parity in the tag address RAM or the tag control RAM while 
one of BIU_LHERR, BIU_LSERR, BC_TPERR, or BC_TCPERR was already set. 


FILL_ECC RO ECC error. This bit, when set, indicates that primary cache fill data received 


from outside the CPU chip contained an ECC error. 


FILL_DPERR RO Fill Parity Error. This bit when set, indicates that the BIU received data 


with a parity error from outside the CPU chip while performing either a 
Deache or Icache fill. FILL_DPERR is only meaningful when the CPU chip is 
in parity mode, as opposed to ECC mode. 


FILL_IRD RO This bit is only meaningful when either FILL_ECC or FILL_DPERR is set. 


FILL_IRD is set to indicate that the error which caused FILL_ECC or FILL_ 
DPERR to set occurred during an Icache fill and clear to indicate that the 
error occurred during a Deache fill. 


FILL_QW RO This field is only meaningful when either FILL_ECC or FILL_DPERR is set. 


FILL_QW identifies the quadword within the hexaword primary cache fill 
block which caused the error. It can be used together with FILL_ADDR([33..5] 
to get the complete physical address of the bad quadword. 


FILL_SEO RO This bit, when set, indicates that a primary cache fill operation resulted in 


either an uncorrectable ECC error or in a parity error while FILL_ECC or 
FILL_DPERR was already set. 


3.10.4 BIU_ADDR 


This read-only register contains the physical address associated with errors reported by BIU_ 
STAT([7..0]. Its contents are meaningful only when one of BIU_LHERR, BIU_SERR, BC_ 
TPERR, or BC_TCPERR are set. Reads of BIU_ADDR unlock both BIU_ADDR and BIU_ 
STAT(7..0]. 


In both EV3 and EV4, BIU_ADDRI[33..5] contain the values of adr_h[33..5] associated with 
the pin bus transaction which resulted in the error indicated in BIU_STAT{(7..0]. 


In EV3, if the BIU_CMD field of the BIU_STAT register indicates that the transaction 
which received the error was READ_BLOCK or LDx/L, then BIU_STAT(4..3] identify which 
quadword within the 32-byte cache block the CPU was attempting to read when the primary 
cache miss occurred. This applies to both I-stream and D-stream reads. If the BIU_LCMD 
field of the BIU_STAT register encodes any pin bus command other than READ_BLOCK or 
LDx/L, then BIU_ADDRI4..3] will contain zeros. BIU_ADDRI63..34] and BIU_ADDR[2..0] 
always read as zero. 


In EV4, if the BIU_CMD field of the BIU_STAT register indicates that the transaction 
which received the error was READ_BLOCK or LDx/L, then BIU_STATI4..2] are UNPRE- 
DICTABLE. If the BIU_CMD field of the BIU_STAT register encodes any pin bus com- 
mand other than READ_BLOCK or LDx/L, then BIU_ADDRI4..2] will contain zeros. BIU_ 
ADDR{63..34] and BIU_LADDR[1..0] always read as zero. 
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3.10.5 FILL_ADDR 


This read-only register contains the physical address associated with errors reported by BIU_ 
STAT([14..8]. Its contents are meaningful only when FILL_ECC or FILL_DPERR is set. Reads 
of FILL_ADDR unlock FILL_ADDR, BIU_STAT{[14..8] and FILL_SYNDROME: 


In both EV3 and EV4, FILL_ADDR{(33..5] identify the 32-byte cache block which the CPU 
was attempting to read when the error occurred. 


In EV3, FILL_ADDR([4..3] identify the quadword within the cache block which the CPU was 
attempting to read when the primary cache fill request was generated. FILL_ADDRI63..34] 
and FILL_ADDR[2..0] read as zero. 


In EV4, if the FILL_IRD bit of the BIU_STAT register is clear, indicating that the error 
occurred during a D-stream cache fill, then FILL_ADDR[4..2] contain bits [4..2] of the physical 
address generated by the load instruction which triggered the cache fill. If FILL_IRD is set, 
then FILL_ADDRI[4..2] are UNPREDICTABLE. FILL_ADDR(63..34] and FILL_ADDR(1..0] 
read as zero. 
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3.10.6 FILL_SYNDROME 
The FILL_SYNDROME register is a 14-bit read-only register. 


If the chip is in ECC mode and an ECC error is recognized during a primary cache 
fill operation, the syndrome bits associated with the bad quadword are locked in the 
FILL_SYNDROME register. FILL_SYNDROMEI6..0] contain the syndrome associated with 
the lower longword of the quadword, and FILL_SYNDROME[13..7] contain the syndrome 
associated with the higher longword of the quadword. A syndrome value of zero means that 
no errors where found in the associated longword. See Table 3-25 for a list of syndromes 
associated with correctable single-bit errors. The FILL_SYNDROME register is unlocked 
when the FILL_ADDR register is read. | 


If the chip is in parity mode and a parity error is recognized during a primary cache fill 
operation, the FILL_SYNDROME register indicates which of the longwords in the quadword 
got bad parity. FILL_LSYNDROMETO01 is set to indicate that the low longword was corrupted, 
and FILL_SYNDROMEI7] is set to indicate that the high longword was corrupted. FILL_ 
SYNDROME[13..8] and [6..1] are RAZ in parity mode. 


6 11 0 0 0 
3 4 3 7 6 0 
fae == +—-—------=-= $= - + 
| | | | 
| RAZ | HI[6..0] | LO[6..0] | 
ee ee aren eerneeee ; 
Table 3-25: Syndromes for Single-Bit Errors 
Data Bit Syndrome(Hex) Check Bit Syndrome(Hex) 

00 4F 00 01 

01 4A 01 02 

02 52 02 04 

03 54 03 08 

04 57 04 | 10 

05 58 05 20 

06 5B 06 : 40 

07 . 5D 

08 23 

09 25 

10 26 

11 29 

12 2A 
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Table 3-25 (Cont.): Syndromes for Single-Bit Errors 


Data Bit Syndrome(Hex) Check Bit Syndrome(Hex) 
13 2C 
14 31 
15 34 
16 OE 
17 0B 
18 13 
19 15 
20 16 
21 19 
22 1A 
23 1C 
24 62 
25 64 
26 67 
27 68 
28 6B 
29 6D 
30 70 


31 75 


3.10.7 BC_TAG 


BC_TAG is a read-only IPR. Unless locked, the BC_TAG register is loaded with the results 
of every backup cache tag probe. When a tag or tag control parity error or primary fill data’ 
error (parity or ECC) occurs this register is locked against further updates. Software may 
read the LSB of this register by using the HW_MFPR instruction. Each time an HW_MFPR 
from BC_TAG completes the contents of BC_TAG are shifted one bit position to the right, so 
that the entire register may be read using a sequence of HW_MFPRs. Software may unlock 
the BC_TAG register using a HW_MTPR to BC_TAG. 


Successive HW_MFPRs from the BC_TAG register must be separated by at least one null 
cycle. | 
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$a---- ~~ es io faa pm p——p——+——+ 

| | | a ee ee ee eee) 

| RAZ |RO| TAG [33..17] |RO|RO|RO}RO|RO}| 

| | Pl dies a) calleess ty” 

$a—- === + po anpan anno $o—+--+--+--+--+ 
| ee: ee a 
| fo | | | tes-> HIT 
| | fo of teeeeee > TAGCTL P 
| | | teeeeee--- > TAGCTL D 
| | teeee-------- > TAGCTL S 
| ponna----------- + TAGCTL V 
fa anna nnn > TAG P 


Unused tag bits in the TAG field of this register are always clear, based on the size of the 
external cache as determined by the BC_SIZE field of the BIU_CTL register. 
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3.11 ECC Error Correction 


When in ECC mode EVx generates longword ECC on writes, and checks ECC on reads. EVx 
does not include hardware to correct single-bit errors, however. 


When an ECC error is recognized during a Deache fill the BIU places the affected fill block into 
the Dcache unchanged, validates the block and posts a machine check. The load instruction 
which triggered the Deache fill is completed by writing the requested longword(s) into the 
register file. The longword(s) read by the load instruction may or not have been the cause of 
the error, but a machine check is posted either way. The Ibox will react to the machine check 
by aborting instruction execution before any instruction issued subsequent to the load could 
overwrite the register containing the load data, and vectoring to the PAL code machine check 
handler. Sufficient state is retained in various status registers (see Section 3.12) for PAL 
code to determine whether the error affects the longword(s) read by the load instruction, 
and whether the error is correctable. In any event, PAL code must explicitly flush the 
Deache. If the longword containing the error was written into the register file, PAL code 
must either correct it and restart the machine, or report an uncorrectable hardware error 
to the operating system. Independent of whether the failing longword was read by the load 
instruction, PAL may scrub memory by explicitly reading the longword with the physical/lock 
variant of the HW_LD instruction, flipping the necessary bit, and writing the longword with 
the physical/conditional variant of the HW_ST instruction. Note that when PAL rereads the 
affected longword the hardware may report no errors, indicating that the longword has been 
overwritten. 


When an ECC error occurs during an Icache fill the BIU places the affected fill block into the 
Icache unchanged, validates the block and posts a machine check. The Ibox will vector to the 
PAL code machine check handler before it executes any of the instructions in the bad block. 
PAL code may then flush the Icache and scrub memory as described above. 


As compared with hardware error correction, this approach is vulnerable to single-bit errors 
which may occur during I-stream reads of the PAL code machine check handler, to single-bit 
errors which occur in multiple quadwords of a cache fill block, and to single-bit errors which 
occur as a result of multiple silo’ed load misses. 
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3.12 Error Flows 


The following sections give a summary of the hardware flows for various error conditions for 
both EV3 and EV4. 


3.12.1 EV3 Error Flows 


3.12.1.1 


l-stream ECC error 


data put into Icache unchanged, block gets validated 

machine check 

BIU_STAT: FILL_ECC, FILL_IRD set, FILL_SEO set if multiple errors occurred 
FILL_ADDR[33..5] & BIU_STAT[FILL_QW] give bad QW’s address 
FILL_SYNDROME contains syndrome bits associated with failing quadword 
BIU_ADDR, BIU_STAT{6..0] locked - contents are UNPREDICTABLE | 
DC_STAT, DC_ADDR locked - contents are UNPREDICTABLE 


BC_TAG holds results of external cache tag probe if external cache was enabled for this 
transaction 


3.12.1.2 D-stream ECC error 


data put into Deache unchanged, block gets validated 

machine check | 

BIU_STAT: FILL_ECC set, FILL_IRD clear, FILL_SEO set if multiple errors occurred 
FILL_ADDR{33..5] & BIU_STAT[FILL_QW] give bad QW’s address 
FILL_SYNDROME contains syndrome bits associated with failing quadword 
BIU_ADDR, BIU_STATI6..0] locked - contents are UNPREDICTABLE 


DC_ADDR:contains PA bits [33:2] of location which the failing load instruction attempted 
to read | 


DC_STAT: RA identifies register which holds the bad data. LW,LOCK,INT,VAX_FP 
identify type of load instruction 


BC_TAG holds results of external cache tag probe if external cache was enabled for this 
transaction 


BIU: tag address parity error 


recognized at end of tag probe sequence 


lookup uses predicted parity so transaction misses the external cache 
BC_TAG holds results of external cache tag probe _ 


machine check 
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3.12.1.6 


BIU_STAT: BC_TPERR set 
BIU_ADDR holds address 


BIU: tag control parity error 


recognized at end of tag probe sequence 
transaction forced to miss external cache 
BC_TAG holds results of external cache tag probe 
machine check 

BIU_STAT: BC_TCPERR set 

BIU_ADDR holds address 


BIU: system transaction terminated with CACK_SERR 


CRD interrupt: NOT SUPPORTED BY EV3 
BIU_STAT: BIU_SERR set, BIU_CMD holds cReq_h[2..0] 
BIU_ADDR holds address 


BIU: system transaction terminated with CACK_HERR 


machine check 
BIU_STAT: BIU_HERR set, BIU_CMD holds cReq_h[2..0] 
BIU_ADDR holds address 


BiU: I-stream parity error (parity mode only) 


data put into Icache unchanged, block gets validated 

machine check 

BIU_STAT: FILL_DPERR set, FILL_IRD set 

FILL_ADDR(33..5] & BIU_STAT[FILL_QW] give bad QW’s address 
FILL_SYNDROME identifies failing longword(s) 

BIU_ADDR, BIU_STAT(6..0] locked - contents are UNPREDICTABLE 
DC_STAT, DC_ADDR locked - contents are UNPREDICTABLE 


BC_TAG holds results of external cache tag probe if external cache was enabled for this 
transaction 
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3.12.1.8 BIU: D-stream parity error (parity mode only) 


¢ data put into Dceache unchanged, block gets validated 

¢ machine check 

¢ BIU_STAT: FILL_DPERR set, FILL_IRD clear 

¢ FILL_ADDR{33..5] & BIU_LSTAT[FILL_QW] give bad QW’s address 

¢ FILL_SYNDROME identifies failing longword(s) 

¢ BIU_ADDR, BIU_STATI6..0] locked - contents are UNPREDICTABLE 


¢ DC_ADDR: contains PA bits [33:2] of location which the failing load instruction attempted 
to read 


¢ DC_STAT: RA identifies register which holds the bad data. LW,LOCK,INT,VAX_FP | 
identify type of load instruction | 


¢ BC_TAG holds results of external cache tag probe if external cache was enabled for this 
transaction 
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3.12.2 EV4 Error Flows 
3.12.2.1 |-stream ECC error 


data put into Icache unchanged, block gets validated 

machine check 

BIU_STAT: FILL_ECC, FILL_IRD set, FILL_SEO set if multiple errors occurred 
FILL_ADDR(33..5] & BIU_STAT[FILL_QW] give bad QW’s address 
FILL_SYNDROME contains syndrome bits associated with failing quadword 
BIU_ADDR, BIU_STAT{6..0] locked - contents are UNPREDICTABLE 
DC_STAT locked - contents are UNPREDICTABLE 


BC_TAG holds results of external cache tag probe if external cache was enabled for this 
transaction 


3.12.2.2 D-stream ECC error 


data put into Dcache unchanged, block gets validated 

machine check | | 
BIU_STAT: FILL_ECC set, FILL_IRD clear, FILL_SEO set if multiple errors occurred 
FILL_ADDR{33..5] & BIU_STAT[FILL_QW] give bad QW’s address 


FILL_ADDR[4..2] contain PA bits [4..2] of location which the failing load instruction 
attempted to read 


FILL_SYNDROME contains syndrome bits associated with failing quadword 
BIU_ADDR, BIU_STAT{6..0] locked - contents are UNPREDICTABLE 


DC_STAT: RA identifies register which holds the bad data. LW,LOCK,INT,VAX_FP 
identify type of load instruction 


BC_TAG halds results of external cache tag probe if external cache was enabled for this 
transaction 


BiU: tag address parity error 


recognized at end of tag probe sequence 

lookup uses predicted parity so transaction misses the external cache 
BC_TAG holds results of external cache tag probe 

machine check 

BIU_STAT: BC_TPERR set 

BIU_ADDR holds address 


Privileged Architecture Library Code 3-51 


3.12.2.4 BIU: tag control parity error 


¢ recognized at end of tag probe sequence 

e transaction forced to miss external cache 

¢ BC_TAG holds results of external cache tag probe 
¢ machine check 

e BIU_STAT: BC_TCPERR set 

e BIU_ADDR holds address 


3.12.2.5 BIU: system external transaction terminated with CACK_SERR 


¢ CRD interrupt. | 
¢ BIU_STAT: BIU_SERR set, BIU_CMD holds cReq_h(2..0]. 
¢ BIU_ADDR holds address. | 


3.12.2.6 BIU: system transaction terminated with CACK_ HERR 


¢ machine check | 
e BIU_STAT: BIU_HERR set, BIU_CMD holds cReq_h[2..0] 
¢ BIU_ADDR holds address 


3.12.2.7 BIU: I-stream parity error (parity mode only) 


e¢ data put into Icache unchanged, block gets validated 

¢ machine check 

¢ BIU_STAT: FILL_DPERR set, FILL_IRD set 

e FILL_ADDRI(33..5] & BIU_STAT[FILL_QW] give bad QW’s address 
¢* FILL_SYNDROME identifies failing longword(s) 

e¢ BIU_ADDR, BIU_STAT{6..0] locked - contents are UNPREDICTABLE 
¢ DC_STAT locked - contents are UNPREDICTABLE 


¢ BC_TAG holds results of external cache tag probe if external cache was enabled for this 
transaction 


3.12.2.8 BIU: D-stream parity error (parity mode only) 


¢ data put into Deache unchanged, block gets validated 

¢ machine check | 

¢ BIU_STAT: FILL_DPERR set, FILL_IRD clear 

e FILL_ADDR{[383..5] & BIU_STAT[FILL_QW] give bad QW’s address 
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FILL_ADDR[4..2] contain PA bits [4..2] of location which the failing load instruction 
attempted to read 


FILL_SYNDROME identifies failing longword(s) 
BIU_ADDR, BIU_STATI6..0] locked - contents are UNPREDICTABLE 


DC_STAT: RA identifies register which holds the bad data. LW,LOCK,INT,VAX_FP 
identify type of load instruction 


BC_TAG holds results of external cache tag probe if external cache was enabled for this 
transaction 
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Chapter 4 
External Interface 


4.1 Overview 


The EVx chip connects directly to an external cache built from off-the-shelf static RAMs. 
Because building high-speed logic is very difficult in low-end systems, the chip controls the 
RAMs directly. The chip contains a programmable external cache interface, so that each 
system can make its own external cache speed and configuration tradeoffs. 


The clocks used by the external interface are generated by the chip, but the speed of the 
clocks is programmable, and is determined during chip reset. This allows each system to 
make its own external interface speed tradeoffs. EVx is also configured during reset to use 
either a 64-bit or 128-bit wide external data bus. The bulk of this chapter describes the chip’s 
operation in 128-bit mode, and Section 4.3 of this chapter describes details specific to 64-bit 
mode operation. 


4.2 Signals 
The following table lists all of the signals on the EVx chip. In the "type" column, an "I" means 


a pin is an input, an "O" means the pin is an output, and a "B" means the pin is tristate and 
bidirectional. 


Table 4-1: EVx Signals 


Signal Name Count Type Function 

clkIn_h, _] 2 I Clock input 

testClkIn_h, _1 20 I Clock input for testing 
cpuClkOut_h 1 O CPU clock output 
sysClkOut1_h, _1 2 O System clock output, normal 
sysClkOut2_h, _] 2 O System clock output, delayed 
dcOk_h 1 I Power and clocks ok 
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Table 4-1 (Cont.): 
Signal Name 


EVx Signals 


reset_] 
icMode_h[1..0] 
sRomOE_1 
sRomD_h 
sRomClk_h 
adr_h{33..5] 
data_h{127..0] 
check_h[27..0] 
dOE_] 
dWSel_h/1..0] 
dRAck_h{2..0] 
tagCEOE_h 
tagCtlWE_h 
tagCtlV_h 
tagCtlS_h 
tagCtiD_h 
tagCtlP_h 
tagAdr_h[33..17] 
tagAdrP_h 

_ tagOk_h, _] 
tagiq | 
dataCEOE_h{3..0] 
dataWE_h{3..0] 
dataA_h[4..3] 
holdReq_h 
holdAck_h 
cReq_h[2..0] 
cWMask_h[7..0] 
cAck_ h{2..0] 
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Count 


Ss ee me DO ee 


29 
128 
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Type 


“ogqoooro0o0oorrwrKt uepewreweweoonrr ns ttwororrn 


Function 


Reset 

Icache Test Mode Selection 
Serial ROM output enable 
Serial ROM data/Rx data 
Serial ROM clock/Tx data 


| Address bus 


Data bus 

Check bit bus 

Data bus output enable 

Data bus write data select 

Data bus read data acknowledge 
tagCtl and tagAdr CE/OE 


— tagCtl WE 


Tag valid 

Tag shared 

Tag dirty 

Tag V/S/D parity 

Tag address 

Tag address parity 
Tag access from CPU is ok 
Tag compare output 
data CE/OE, longword 
data WE, longword 
data Af4..3] 

Hold request 

Hold acknowledge 
Cycle request 

Cycle write mask 


Cycle acknowledge 





Table 4-1 (Cont.): EVx Signals 
Signal Name Count Type Function 


iAdr_h{12..5] 8 I Invalidate address 
dInvReg_h 1 I Invalidate request, Deache 
dMapWE_h 1 O Backmap WE, Dcache 
irq_h[5..0] 6 I Interrupt requests 

vRef 1 I Input reference 

eclOut_h 1 I Output mode selection 
perf_cnt_h[1..0] 2 I Performance counter inputs 
tristate_] 1 I Tristate for testing 

cont_] 1 I Continuity for testing 


Systems using EVx in 128-bit mode should ignore dataA_h[3] and tie dWSel_h[0] false. See 
Section 4.3 for 64-bit mode details. 


4.2.1 Clocks 


External logic supplies EVx with a differential clock at twice the desired internal clock 
frequency via the clkIn_h and clkIn_1l pins. EVx divides this clock by two to generate the 
internal chip clock. 


The internal chip clock is supplied to the external interface via the cpuC]kOut_h pin. The 
false-to-true transition of cpuClkOut_h is the "CPU clock" used in the timing specification for 
the tagOk_h,_1 signals. 


The CPU clock is divided by a programmable value between 2 and 8 to generate a system 
clock, which is supplied to the external interface via the sysC]kOut1_h and sysClkOut1_] 
pins. The system clock is delayed by a programmable number of CPU clock cycles between 0 
and 3 to generate a delayed system clock, which is supplied to the external interface via the 
sysClkOut2_h and sysClkOut2_1l pins. 


The clock generator runs, generating cpuC]kOut_h and correctly timed and positioned 
sysClkOut1 and sysClikOut2, while the chip is held in reset. © 


The output of the programmable divider is symmetric if the divisor is even, and asymmetric 
with sysClkOut1_h TRUE for one extra CPU cycle if the divisor is odd. 


The false-to-true transition of sysClkOut1_h is the "system clock" used as a timing reference 
throughout this specification. 


Almost all transactions on the external interface run synchronously to the CPU clock 
and phase aligned to the system clock, so the external interface appears to be running 
synchronously to the system clock (most setup and hold times are referenced to the system 
clock). The exceptions to this are the fast, EVx controlled transactions on the external caches 
and the sample of the tagOk_h, _] inputs, which are synchronous to the CPU clock, but 
independent of the system clock. 
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4.2.2 DC_OK and Reset 


EVx contains a ring oscillator which is switched into service during power up to provide 
an internal chip clock. The dcOk_h signal switches clock sources between an on-chip ring 
oscillator and the external clock oscillator. If deOk_h is false then the on-chip ring oscillator 
feeds the clock generator, and EVx is held in reset independent of the state of the reset_| 
signal. If dcOk_h is true then the external clock oscillator feeds the clock generator. When 
dcOk_h is true the vRef input must be valid so that inputs can be sensed. The dcOk_h signal 
is special in that it does not require that vRef be stable to be sensed. It is important to 
emphasize the importance of driving dcOk_h false until the voltage on vRef has stabilized. 
Because chip testers can apply clocks and power to the chip at the same time, the chip tester 
can always drive dcOk_h true, but the tester must drive reset_] true for a period longer than 
the minimum hold time of vRef. 


When EV«x is running off the internal ring oscillator the clock outputs follow it, just like they 
would when real clocks are applied. The frequency of the ring oscillator varies from chip 
to chip within a range of 10MHZ to 100MHz, which corresponds to an internal CPU clock 
frequency of between 5 MHz and 50 MHz. Also, when the dcOk_h signal is false, the system 
clock divisor is forced to eight, and the sysC]kOut2_h, _1 delay is forced to three. 


Note if the dcOk_h signal is generated by an RC delay, there is no check that the input 
clocks are really running. This means that if a board is powered up in manufacturing with a 
missing, defective, or mis-soldered clock oscillator then EVx will enter a possibly destructive 
high-current state. Furthermore, if a clock oscillator fails in stage 1 burn-in then EVx may 
also enter this state. The frequency and duration of such events need to be understood by 
the module designer to decide if this is really a problem. 


The reset_! signal forces the CPU into a known state - see Table 3—8. The reset_l signal may 
be asynchronous, and need not be asserted beyond the assertion of dcOk_h to guarantee that 
the EVx chip is properly reset. 


In order to bring the chip out of internal reset at a deterministic time, the reset_l pin may 
be deasserted synchronously with respect to the system clock. See chapter Chapter 6 for the 
setup and hold requirements of the reset_] pin when used in this way. 


The EV3 and EV4 CPU chips. use a 3.3V power supply. This 3.3V supply must be stable 
before any input goes above 4V. 


While in reset, EVx reads sysC]kOut and external bus configuration information off the irq_h 
pins - external logic should drive the configuration information onto the irq_h pins any time 
reset_l is true. 


The irgq_h{5] bit is used to select 128-bit or 64-bit mode. If irq_h[5] is true then 128-bit mode 
is selected. | 


The irg_h[2..0] bits encode the value of the divisor used to generate the system clock from 
the CPU clock. 
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Table 4-2: System Clock Divisor 


irq_h[2] irq_h{1] irq_h{0] Ratio 
F F F 2 
F F T 3 
F T F 4 
F T T 5 
T F F 6 
T F T 7 
T T F 8 
i T T 8 


The irq_h[4..3] bits encode the delay, in CPU clock cycles, from sysClkOut1 to sysClkOut2. 


Table 4-3: System Clock Delay 


irq_h[4] irg_h{3] Delay 
F EF 0 
F T 1 
T F 2 
T T 3 


When the tristate_l pin is asserted the chip is internally forced into the reset state, without 
resampling the interrupt pins. 


4.2.3 Initialization and Diagnostic Interface 


EV4 implements three Icache initialization modes to support chip and module level testing. 

. The value placed on icMode_h[1..0] determines which of these modes is used after EV4 is 
reset. Unlike the value placed on irq_h{5..0] during reset, the value placed on icMode_h([1..0] 
must be retained after reset_| is deasserted. 


Table 4-4: Icache Test Modes 
icMode_h[{1] icMode_h[0] Mode 


F F | Serial Rom 

F T Disabled 

T F [cache Test - Write 
T T Icache Test - Read 
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If the value on icMode_h[1..0] selects Serial ROM Mode, EV4 will load the contents of its 
internal Icache from an external serial ROM (such as an AMD Am1736) before executing 
its first instruction. The serial ROM could contain enough ALPHA code to complete the 
configuration of the external interface, e.g. setting the timing on the external cache RAMs, 
and diagnose the path between the CPU chip and the real ROM. EV4 is in PALmode following 
the deassertion of reset_] - this gives the code loaded into the Icache access to all of the visible 
state within the chip. 


Three signals are used to interface to the serial ROM. The sRomOE_] output signal supplies 
the output enable to the ROM, serving both as an output enable and as a reset (refer to the 
serial ROM specifications for details). The sRomClk_h output signal supplies the clock to 
the ROM that causes it to advance to the next bit. The ROM data is read by EVx via the 
sRomD_h input signal. 


Once the data in the serial ROM has been loaded into the Icache, the three special signals 
become simple parallel I/O pins that can be used to drive a diagnostic terminal. When the 
serial ROM is not being read, the sRomOE_] output signal is false. If this pin is wired to 
the active high enable of an RS422 receiver driving onto sRomD_h (the 26LS32 will work) 
and to the active high enable of an RS422 driver driving from sRomClk_h (the 26LS31 will 
work). The CPU allows sRomD_h to be read and sRomClk_h to be written by PALcode; this 
is sufficient hardware support to implement a bit-banged serial interface. 


Using the icMode_h[1..0] pins, the Icache diagnostic interface may be disabled altogether. In 
this case, since the Icache valid bits are cleared by reset, the first instruction fetch will miss 
the Icache. 


In addition to Serial ROM mode, EV4 includes two test modes which together allow chip 
tester hardware full read and write access to the Icache. Icache Test/Write Mode works 
exactly like Serial ROM mode except that bits are loaded into the Icache at a higher rate. 
Icache Test/Read Mode allows the contents of the Icache to be read in a bit-serial manner 
from the sRomOE_1 pin. These two modes are available only to chip test hardware. Systems 
using EV4 must tie icMode_h[1] to FALSE. 


In EV4, all Icache bits are loaded from the diagnostic interface, including each blocks’ tag, 
ASN, ASM, valid and branch history bits. The Icache blocks are loaded in sequential order 
starting with block zero and ending with block 255. The order in which bits within each block 
are serially loaded is shown below: 


a 

s 
bht lw7 lw5 lw3 lwl Vv m asn tag lw6 1w4 lw2 1w0 
tamant tammt temat tamnt fem at fat tat tecnt penn t feet teen t t$--n-t t---F+ 
| | | ee | | | | Ae eee (i ee ed ee | | | l> | | | | | | 
bf 0d | | | | | | | lt |} |} td | | | | | | | li 


Pe ew I aI Re ST SPP a Se all ei) GE si 
parma fama t temnt teeant femnt ft fet Feet feat tenet $---+ te--$ $---t 


Bits within each field are arranged such that high-order bits are on the 
left. The serial chain shifts to the right. 


EV3 does not implement the Icache Test/Write and Icache Test/Read modes described above. 
Further, the icMode_h[1] pin does not connect to the EV3 die. Also, for EV3 the serial 
ROM should contain only the bits of the instructions which are to be loaded into the Icache. 
When the Icache is loaded the valid bit in each cache block is set, and the tag is cleared. 
Conceptually, the data bits from the serial ROM are shifted into a 64-bit wide holding register 
and then written into the Icache 64 bits at a time. The bits from the serial ROM are shifted 
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into this holding register from the least significant bit to the most significant bit. Quadwords 
are written into the Icache in increasing order starting with the quadword at byte address 
zero. 


4.2.4 Address Bus 


The tristate, bidirectional adr_h pins provide a'‘path for addresses to flow between EVx and 
the rest of the system. The adr_h pins are connected to the buffers that drive the address 
pins of the external cache RAMs, and to the transceivers that are located between the EVx 
local address bus and the CPU module address bus. 


The address bus is normally driven by EVx. EVx stops driving the address bus during reset 
and during external cache hold. In the external cache hold state the address bus acts like 
an input, and the tagEq_! output is the result of an equality compare between adr_h and 
tagAdr_h. Only bits that are part of the cache tag, as specified by the BC_SIZE field of the 
BIU_CTL IPR, participate in the compare. The tagEq_l pin is asserted during external cache 
hold only if the result of the tag comparison is true, and the parity calculated across the 
appropriate bits of tagAdr_h matches the value on tagAdrP_h. Even parity is used. tagEq_l 
is deasserted when the address bus is not in the external cache hold state. 


4.2.5 Data Bus 


The tristate, bidirectional data_h pins provide a path for data to flow between EVx and the 
rest of the system. The data_h pins connect directly to the I/O pins of the external cache data 
RAMs and to the transceivers that are located between the EVx local data bus and the CPU 
module data bus. 


The tristate, bidirectional check_h pins provide a path for check bits to flow between the CPU 
and the rest of the system. The check_h pins connect directly to the I/O pins of the external 
cache data RAMs and to the transceivers that are located between the EVx local check bus 
and the CPU module check bus. 


The data bus is driven by EVx when it is running a fast write cycle on the external caches, 
or when some type of write cycle has been presented to the external interface and external 
logic has enabled the data bus drivers (via dOE_]). 


If EVx is in ECC mode then the check_h pins carry 7 check bits for each longword on the 
data bus. Bits check_h[6..0] are the check bits for data_h[31..0]. Bits check_h[13..7] are the 
check bits for data_h{63..32]. Bits check_h[20..14] are the check bits for data_h[95..64]. Bits 
check_h[{17..21] are the check bits for data_h{127..96]. 


The following ECC code is used. This code is the same one used by the IDT49C460 and 
AMD29C660 32-bit ECC generator/checker chips. | 
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dddddddddddddddddddddddddddddddd 
33222222222211111111110000000000 
10987654321098765432109876543210 | 


c6 XOR xxxxxxxx XXXXXXXX 
c5 XOR xxxxXxxxx XXXXXXXX 
c4 XOR xx XXXXXX XX XXXXXX 


c3 XNOR XXX XXX xs xRx XXX XX 
c2 XNOR x x xX X XX XX X XX X XK X 
cl xXOR X X X X X XXX XxX X X X X XXX 
cO XOR X XxX x xX Xxx. X MOXKKK XK X. 


By arranging the data and check bits correctly, it is possible to arrange that any number of 
errors restricted to a 4-bit group can be detected. One such arrangement is as follows: 


d{00], d[Ol], d{03], d[25] 
d{[02], d{04], d[06], c[06] 
d(05}], d[07], d[12], ¢[03] 
d[08], d[{09], d[il], d[14] 
ad{10], d{1i3], df{i5], d[19] 
a(16], df{i7], d[{22], da[28] 
ad[18], d[{23], d[{30], c[05] 
d[20], d[27], c[{04], c[00] 
d{21], d[26], c[02], c[01] 
d{24], d{29], da[31] 


If EVx is in PARITY mode then 4 of the check_h pins carry EVEN parity for each longword 
on the data bus, and the rest of the bits are unused. Bit check_h([0] is the parity bit for 
data_h[31..0]. Bit check_h([7] is the parity bit for data_h[63..32]. Bit check_h[14] is the parity 
bit for data_h[95..64]. Bit check_h[21] is the parity bit for data_h{127..96]. 


The ECC bit in the BIU_CTL IPR determines if EVx is in ECC mode or in PARITY mode. 
4.2.6 External Cache Control 


The external cache is a direct-mapped, write-back cache. EVx always views the external 
cache as having a tag for each 32-byte block (the same as the on-chip Icache and Dcache). 


The external cache tag RAMs are located between EVx’s local address bus and EVx’s tag 
inputs. The external cache data RAMs are located between the CPU’s local addréss bus and 
the CPU’s local data bus. EVx reads the external cache tag RAMs to determine if it can 
complete a cycle without any module level action, and EVx reads or writes the external cache 
data RAMs if, in fact, this is the case. 


A cycle requires no module level action if it is a non-LDxL read hit to a valid block, or a 
non-STxC write hit to a valid but not shared block. All other cycles require module level 
action. All cycles require module level action if the external cache is disabled (the BC_EN bit 
in the BIU_CTL IPR is cleared) or the physical address of the reference is in a quadrant in 
memory that is not cached, i.e. the appropriate bit in the BC_PA_DIS field in the BIU_CTL 
IPR is set for the quadrant of the reference. 


All EVx controlled cycles on the external cache have fixed timing, described in terms of EVx’s _ 
internal clock. The actual timing of the cycle is programmable (via the BC_RD_SPD, BC_ 
WR_SPD, and BC_WE_CTL fields in the BIU_CTL IPR), allowing for much flexibility in the 
choice of CPU clock frequencies and cache RAM speeds. 


The external cache RAMs can be partitioned into three sections; the tagAdr RAM, the tagCtl 
RAM, and the data RAM. Sections do not straddle physical RAM chips. 
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4.2.6.1 The TagAdr RAM 


The tagAdr RAM contains the high order address bits associated with the external cache 
block, along with a parity bit. The contents of the tagAdr RAM is fed to the on-chip address 
comparator and parity checker via the tagAdr_h and tagAdrP_h inputs. 


EVx verifies that tagAdrP_h is an EVEN parity bit over tagAdr_h when it reads the tagAdr 
RAM. If the parity is wrong, the tag probe results in a miss, and an external transaction is 
initiated. If machine checks are enabled (the MCHK_EN bit in the Abox_CTL IPR is set) 
EVx traps to PALcode. 


The number of bits of tagAdr_h that participate in the address compare and the parity check 
is controlled by the BC_SIZE field in the BIU_CTL IPR. The tagAdr_h signals go all the way 
down to address bit 17, allowing for a 128Kbyte cache built out of RAMs that are 8K deep. 


The chip enable or output enable for the tagAdr RAM is normally driven by a two input NOR 
gate (such as the 74AS805B). One input of the two input NOR gate is driven by tagCEOE_h, 
and the other input is driven by external logic. EVx drives tagCEOE_h false during reset, 
during external cache hold, and during any external cycle. The OE bit in the BIU_LCTL IPR 
determines if tagCEOE_h has chip enable timing or output enable timing. 


4.2.6.2 The TagCtl RAM 


The tagCtl RAM contains control bits associated with the external cache block, along with a 
parity bit. EVx reads the tagCtl RAM via the three tagCtl signals to determine the state of 
the block. EVx writes the tagCtl RAM via the three tagCtl signals to make blocks dirty. 


EVx verifies that tagCtlP_h is an EVEN parity bit over tagCtlV_h, tagCtlS_h, and tagCtlD_h 
when it reads the tagCtl RAM. If the parity is wrong, the tag probe results in a miss, and an 
external transaction is initiated. If machine checks are enabled (the MCHK_EN bit in the 
Abox_CTL IPR is set) EVx traps to PALcode. EVx computes EVEN parity across the tagCtlV_ 
h, tagCtlS_h, and tagCtID_h bits, and drives the result onto the ee h pin, when it writes 
the tagCtl RAM. 


The following combinations of the tagCtl RAM bits are silawied: Note that the bias toward 


conditional write-through coherence is really only in name; the tagCtlS_h bit can be viewed 
simply as a write protect bit. 


Table 4-5: Tag Control Encodings 


tagCtIV_h tagCtiS_h tagCtID_h> Meaning 
F X Xx Invalid 
T F F Valid, private 
T F T Valid, private, dirty 
Ak T F Valid, shared 
T T T Valid, shared, dirty 


EVx can satisfy a read probe if the tagCtl bits indicate the entry is valid (tagCtlV_h = T). 
EVx can satisfy a write probe if the tagCtl bits indicate the entry is valid and not shared 
(tagCtlV_h = T, tagCtlS_h = F). 
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The chip enable or output enable for the tagCtl RAM is normally driven by a two input NOR 
gate (such as the 74AS805B). One input of the two input NOR gate is driven by tagCEOE_h, 
and the other input is driven by external logic. EVx drives tagCEOE_h false during reset, 
during external cache hold, and during any external cycle. The OE bit in the BIU_CTL IPR 
determines if tagCEOE_h has chip enable timing or output enable timing. 


The write enable for the tagCtl RAM is normally driven by a two input NOR gate (such 
as the 74AS805B). One input of the two input NOR gate is driven by tagCtlWE_h, and the 
other input is driven by external logic. EVx drives tagCtlWE_h false during reset, during 
external cache hold, and during any external cycle. The BC_WE_CTL field in the BIU_CTL 
IPR determines the width of the write enable, and its position within the write cycle. 


4.2.6.3 The Data RAM 
The data RAM contains the actual cache data, along with any ECC or parity bits. 


The most significant bits of the data RAM address are driven, via buffers, from the address 
bus. The least significant bit of the data RAM address is driven by a two input NOR gate 
(such as the 74AS805B). One of the inputs of the two input NOR gate is driven by dataA_ 
h(4], and the other input is driven by external logic. EVx drives dataA_h[4] false during 
reset, during external cache hold, and during any external cycle. 


The chip enables or output enables for the data RAM are driven by a two input NOR gate 
(such as the 74AS805B). One input of the two input NOR gate is driven by dataCEOE_h[3..0], 
and the other input is driven by external logic. EVx drives dataCEOE_h{3..0] false during 
reset, during external cache hold, and during external cycles. The OE bit in the BIU_CTL 
IPR determines if dataCEOE_h(3..0] has chip enable timing or output enable timing. 


The write enables for the data RAM are normally driven by a two input NOR gate (such as 
the 74AS805B). One input of the two input NOR gate is driven by dataWE_h([3..0], and the 
other input is driven by external logic. EVx drives dataWE_h(3..0] false during reset, during 
external cache hold, and during any external cycle. The BC_WE_CTL field in the BIU_CTL 
IPR determines the width of the write enable, and its position within the write cycle. 


4.2.6.4 Backmap 


Some systems may wish to maintain a backmap of the contents of the primary data cache to 
improve the quality of their invalidate filtering. EVx must maintain the backmap for external 
cache read hits, since external cache read hits are controlled totally by EVx. External logic 
maintains the backmaps for external cycles (read misses, invalidates, and so on). 


The backmap is only consulted by external logic, so that its format, or, for that matter, its 
existence, is of no concern to EVx. All EVx does is generate a backmap write pulse at the right 
time. Simple systems will not bother to maintain a backmap, will not connect the backmap 
write pulse to anything, and will generate extra invalidates. 


The write enable input of the data cache backmap RAM is driven by a two input NOR gate 
(such as the 74AS805B). One side of the two input NOR gate is driven by dMapWE_h, and 
the other input is driven by external logic. The CPU drives a write pulse onto 0 dMapWE_ h 
whenever is fills the on-chip data cache from the external cache. 


In 128-bit mode the dMapWE_h and iMapWE_h[1..0] signals assert one CPU cycle into the 
second (last) data read cycle, and negate one CPU cycle from the end of that cycle. If read 
cycles are 3 CPU cycles long, then dMapWE_h is one CPU cycle long. See Section 4.3 for 
64-bit mode operations. 
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[Implementation Note: This anomaly is caused by the fact that the backmap write overlaps 
a cycle whose length is specified by BC_RD_SPD. If we used the standard write pulse timing 
mechanism, and BC_WR_SPD were longer than BC_RD_SPD, the address would go away in 
the middle of the write cycle. ] 


4.2.6.5 External Cache Access 


The external caches are normally controlled by EVx. Two methods exist for gaining access to 
the external cache RAMs. 


4.2.6.5.1 HoldReq and HoidAck 


The simple method for external logic to access the external caches is to assert the holdReq_h 
signal. When holdReq_h is asserted, EVx finishes any external cache cycle which may be in 
progress, tristates adr_h, data_h, check_h, tagCtlV_h, tagCtlD_h, tagCtlS_h and tagCtlP_h, 
drives tagCEOE_h, tagCtlWE_h, dataCEOE_h, data_WE_h and dataA_h false, and asserts 
holdAck_h - the cReq_h and cWMask_h signals are not modified in any way. When external 
logic is finished with the external caches it deasserts holdReq_h. When EVx detects the 
deassertion of holdReq_h it deasserts holdAck_h and re-enables its outputs. 


The holdReq_h signal is synchronous, and external logic must guarantee setup and hold 
requirements with respect to the system clock. The holdAck_h signal is synchronous to the 
CPU clock but phase aligned to the system clock, so it can be used as an input to state 
machines running off the system clock. 


EVx generates the holdAck_h signal such that it may be tied directly to the enable-inputs of 
external tristate drivers which connect to the bidirectional pin bus signals. EVx will turn off 
its tristate drivers on or before the system clock edge at which it asserts holdAck_h, and will 


turn on its tristate drivers two CPU cycles after the system clock edge at which it deasserts 
holdAck_h. 


The delay from holdReq_h assertion to holdAck_h assertion depends on the programming of 
the external interface, and on exactly how the system clock is aligned with a pending external 
cache cycle. In the best case the external cache is idle or is just about to start a cycle, in 
which case holdAck_h asserts one system clock cycle after the system clock edge at which 
EVx samples the holdReq_h assertion. In the worst case the system clock edge at which EVx 
samples the holdReq_h assertion happens one CPU clock cycle into an external cache write 
probe that hits on a non shared line and requires two RAM data cycles to complete. In this 
case holdAck_h asserts at the first system clock edge that is at least (BC_RD_SPD + 1) - 1) 
+ 2*(BC_WR_SPD + 1) + 1 CPU cycles after the system clock edge at which EVx sampled the 
holdReq_h assertion. 


HoldAck_h deasserts in the system clock cycle immediately following the system clock edge 
at which EVx samples the deassertion of holdReq_h. 


A holdReq_h/holdAck_h sequence can happen at any time, even in the middle of an external 
transaction. In this case all of the acknowledge-like signals (AOE_] dWSel_h, dRAck_h, cAck_ 
h) work normally, although since EVx has forced most of its outputs to either tristate or false, 
doing anything useful with them is difficult. 


The assertion of holdReq_h prevents EVx’s BIU sequencer from starting new CPU requests. 
However, if the BIU sequencer has already started an external cache tag probe when holdReq_ 
his asserted, and the result of the tag probe is such that an external transaction is required 
to complete the CPU’s request, the BIU sequencer will initiate the external transaction by 
driving the cReq_h signals to the appropriate value despite holdReq_h’s assertion. HoldAck_h 
will assert at the next system clock edge after the tag probe completes. 
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Note that since EVx doesn’t turn on its tristate drivers until two CPU cycles after it deasserts 
holdAck_h care must be taken as to when external logic begins processing new external 
transactions at the tail end of a holdReq_h/holdAck_h sequence. 


4.2.6.5.2 TagOk 


The fastest way for external logic to gain access to the external caches is to use the tagOk_ 
h, _] signals. TagOk_h, _1 are EVx bus interface control signals which allow external logic 
to stall a CPU cycle on the external cache RAMs at the last possible instant. All tradeoffs 
surrounding these signals have been made in favor of high-performance systems making them 
next to impossible to use in low-end systems. 


The tagOk_h and tagOk_] signals are synchronous, and external logic must guarantee setup 
and hold requirements with respect to the CPU clock. This implies very fast logic, since the 
CPU clock may run at 200 MHz for the binned parts. 


Furthermore, the only thing that tagOk does is stall a sequencer in the EVx bus interface 
unit. EVx does not tri-state the busses that run between EVx and the external cache RAMs. 
External logic must supply the necessary multiplexing functions in | the address and data 
path. 


If the tagOk is true at a CPU clock edge, the external logic is guaranteeing that the tagCtl 
and tagAdr RAMs were owned by EVx in the previous BC_RD_SPD+1 CPU cycles, that the 
tagCtl RAMs will be owned by EVx in the next BC_WR_SPD+1 cycles, that the data RAMs 
were owned by EVx in the previous BC_LRD_SPD+1 cycles, and that the data RAMs will be 
owned by EVx in the next BC_RD_ SPD+1 CPU cycles or in the next 2*(BC_WR_SPD+1) CPU 
cycles, whichever is longer. | 


The bus interface unit samples tagOk in the last two cycles of each tag probe, and only 
proceeds if tagOk was asserted in both of these cycles. Two cycles of tagOk assertion rather 
then one was cycle of assertion was chosen to alleviate a tight circuit path inside the chip. 
This choice in no way impacts the above stated use of tagOk by external logic. If EVx samples 
tagOk as false in either of the last two CPU cycles of a tag probe then it stalls until it samples 
tagOk true in consecutive cycles (at which time all of the above assertions are true, which 
means, in particular, that any address EVx has been holding on the address bus all this time 
has made it through the external cache RAMs), and then it proceeds normally. 


4.2.7 External Cycle Control 


EVx requests an external cycle when it determines that the cycle it wants to run requires 
module level action. 


An external cycle begins when EVx puts a cycle type onto the cReq_h outputs. Some cycles 
put an address on the adr_h outputs, and additional information (low-order address bits, 
I/D stream indication, write masks) on the cWMask_h outputs. All of these outputs are 
synchronous, and EVx meets setup and hold requirements with respect to the system clock. 


The cycle types are as follows. 
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Table 4-6: Cycle Types 


cReq_h[2] cReq_h{1] cReq_h(0] _ Type 
F F F IDLE 
F T BARRIER 
F fi F FETCH 
¥ T T FETCHM 
T F F READ_BLOCK 
T F T WRITE_BLOCK 
T T F LDxL 
T T T STxC 


A BARRIER cycle is generated by the MB instruction. Normally all the module does with 
this cycle is acknowledge it. Modules which have write buffers between EVx and the memory 
system must drain these buffers before the cycle is acknowledged to guarantee that machine 
checks caused by transport and/or memory system errors get posted on the correct side of the 
MB instruction. 


The FETCH and FETCHM cycles are generated by the FETCH and FETCHM instructions, 
respectively. The address bus contains the effective address of the FETCH or FETCHM 
instruction. These addresses can be used by module level prefetching logic. Simple systems 
simply acknowledge the cycles. 


The READ_BLOCK cycle is generated on read misses. External logic reads the addressed 
block from memory and supplies it, 128 bits at a time, to EVx via the data bus. External 
logic may also write the data into the external cache, after perhaps writing a victim. 


The WRITE_BLOCK cycle is generated on write misses, and on writes to shared blocks. 
External logic pulls the write data, 128 bits at a time, from EVx via the data bus, and writes 
the valid longwords to memory. External logic may also write the data into the external 
cache, after perhaps writing a victim. 


The LDxL cycle is generated by the interlocked load instructions. The cycle works just like a 
READ_BLOCK, although the external cache has not been probed (so the external logic needs 
to check for hits), and the address has to be latched into a locked address register. 


' The STxC cycle is generated by the conditional store instructions. The cycle works just like 
a WRITE_BLOCK, although the external cache has not been probed (so that external logic 
needs to check for hits), and the cycle can be acknowledged with a failure status. 


On WRITE_BLOCK and STxC cycles the cWMask_h pins supply longword write masks to the 
external logic, indicating which longwords in the 32-byte block are, in fact, valid. A cWMask_ 
h bit is true if the longword is valid. WRITE_BLOCK commands can have any combination of 
mask bits set. STxC cycles can only have combinations that correspond to a single longword 
or quadword. 
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On READ_BLOCK and LDxL cycles the cWMask_h pins have additional information about 
the miss overloaded onto them. The cWMask_h[1..0] pins contain miss address bits [4..3] 
(indicating the address of the quadword that actually missed), which is needed to implement 
quadword read granularity to I/O devices. The cWMask_h[2] pin is true if the miss is a 
D-stream reference, and false if the miss is an I-stream reference. 


The cycle remains on the external interface until external logic acknowledges it, by placing an 
acknowledgment type on the cAck_h pins. The cAck_h inputs are synchronous, and external 
logic must guarantee setup and hold requirements with respect to the system clock. 


The acknowledgment types are as follows. 


Table 4-7: Acknowledgment Types 


cAck_h{2] cAck_h{1] cAck_h(0] Type 
F F F IDLE 
Fk F T HARD_ERROR 
F T F SOFT_ERROR 
F T T STxC_FAIL 
fy F F OK 


EVx behavior in response to cAck_h encodings others than those listed above is UNDEFINED. 


The HARD_ERROR type indicates that the cycle has failed in some catastrophic manner, EVx 
latches sufficient state to determine the cause of the error, and initiates a machine check. 


The SOFT_ERROR type indicates that a failure occurred during the cycle, but the failure 
was corrected. EVx latches sufficient state to determine the cause of the error, and initiates 
a corrected error interrupt. 


The STxC_FAIL type indicates that a STxC cycle has failed. Itis UNDEFINED what happens 
if this type is used on anything but an STxC cycle. 


The OK type indicates success. 


The dRAck_h pins inform EVx that read data is valid on the data bus, if the data should 
be cached, and if ECC checking and correction or parity checking should be attempted. 
The dRAck_h inputs are synchronous, and external logic must guarantee setup and hold 
requirements with respect to the system clock. If dRAck_h is sampled IDLE at a system 
clock then the data bus is ignored. If dRAck_h is sampled non IDLE at a system clock then 
the data bus is latched at that system clock, and external logic must guarantee that the data 
meets setup and hold with respect to the system clock. 


The acknowledgment types are as follows. 
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Table 4-8: Read Data Acknowledgment Types 
dRAck_h{2] dRAck_h{1] dRAck_h[0} Type 


F F F IDLE 

T F F  OK_NCACHE_NCHK 
T F T OK_NCACHE 

T T F OK_NCHK 

T 7 T OK 


EVx behavior in response to dRAck_h encoding others than those listed above is UNDE- 
FINED. 


The first non IDLE sample of dRAck_h tells EVx to sample data bytes [15..0], = the second 
non IDLE sample of dRAck_h tells EVx to sample data bytes [31..16]. External logic may 
drive the second dRAck_h and the cAck_h during the same system clock. 


READ_BLOCK and LDxL transactions may be terminated with HARD_ERROR status before 
all expected dRAck_h cycles are received. Here the behavior of EV3 and EV4 differ 
slightly. In EV3 the affected primary cache block is invalidated, and its data contents are 
UNPREDICTABLE. In EV4 the contents of the entire cache block, including its tag and valid 

bit, are UNPREDICTABLE. In both EV3 and EV4 a machine check is posted. | 


EVx may use D-stream primary cache fill data as soon as it is received, including data 
received in the first half of a READ BLOCK transaction which is later terminated with 
HARD_ERROR. EVx does not use any I-stream primary cache fill data until it successfully 
receives the entire cache block. 


EVx does not change its interpretation of dRAck_h[1..0] based on cAck_h if all expected 
dRAck’s are received, so external logic must avoid caching and/or ECC/parity checking data 
which is known to be garbage if it cares. 


EVx behavior is UNDEFINED if dRAck_h is asserted in a non-read cycle. 


EVx checks dRAck_h{0] (the bit that determines if the block is ECC/parity checked) during 
both halves of the 32-byte block. It is legal, but probably not useful, to check only one half of 
the block. 


EV3 checks dRAck_h{1] (the bit that determines if the block is cached or not) during the 
second half of the 32-byte block. EV4 checks dRAck_h[1] during the first half of the 32-byte 
block. 


The dOE_]! inputs tells EVx if it should drive the data bus. It is a synchronous input, so 
external logic must guarantee setup and hold with respect to the system clock. If dOE_] is 
sampled true at a system clock then EVx drives the data bus at the system clock if it has a 
WRITE_BLOCK or STxC request pending (the request may already be on the cReq pins, or it 
may appear on the cReq pins at the same system clock edge as the data appears). If dOE_] is 
sampled false at the system clock then EVx tri-states the data bus on the next system clock 
cycle. The cycle type is factored into the enable so that systems can leave dOE_] asserted 
unless it is necessary to write a victim. 
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The dWSel_h inputs tells EVx which half of the 32-byte block of write data should be driven 
onto the data bus (dOE_] permitting). They are synchronous inputs, so external logic must 
guarantee setup and hold with respect to the system clock. If dWSel_h[1] is sampled false 
at the end of a system clock cycle then bytes [15..0] are driven onto the data bus in the next 
system clock cycle. If dWSel_h[1] is sampled true at the end of a system clock cycle then 
bytes [31..16] are driven onto the data bus in the next system clock cycle. Once dWSel_h(1] 
has been sampled true bytes [15..0] are lost; there is no backing up. 


4.2.8 Primary Cache Invalidate 


External logic needs to be able to invalidate primary data cache blocks to maintain coherence. 
EVx provides a mechanism to perform the necessary invalidates, but enforces no policy as to 
when invalidates are needed. Simple systems may choose to invalidate more or less blindly, 
and complex systems may choose to implement elaborate invalidate filters. 


There are two situations where entries in the on-chip Deache may need to be invalidated. 


The first situation is the obvious one. Any time an external agent updates a block in memory 
(for example, an I/O device does a DMA transfer into memory), and that block has been loaded 
into the external cache, then the external cache block must be either invalidated or updated. 
If that external cache block has been loaded into the Deache then that Deache block must be 
invalidated. | 


The second situation is more subtle. If a system is maintaining the Dcache as a subset of 
the external cache, and an Icache miss results in an external cache block being replaced, and 
that external cache block has been loaded into Dceache, then an invalidate is needed. 


External logic invalidates an entry in the Dcache by asserting the dInvReq_h signal. EVx 
samples dInvReq_h at every system clock. When EVx detects dInvReq_h asserted, it 
invalidates the block in the Deache whose index is on the iAdr_h pins. 


EVx can accept an invalidate at every system clock. 


The dInvReq_h input is synchronous, and external logic must guarantee setup and hold with 
respect to the system clock. The iAdr_h inputs are also synchronous, and external logic must 
guarantee setup and hold with respect to the system clock in any cycle in which dInvReq_h 
is true. | 


4.2.9 Interru pts 


External interrupts are fed to EVx via the irq_h bus. The 6 interrupts are identical; they may 
be asynchronous; they are level sensitive; and they can be individually masked by PALcode. 


It is expected that on most ALPHA systems the combination of hardware and PALcode will 
use these 6 inputs as a power fail interrupt, a halt interrupt, and as 4 external interrupts 
(with the timer interrupt, the interprocessor interrupt, and the corrected read data interrupt 
wired to their normal IPL) but this is not enforced by EVx. Low-end systems could, for 
example, use all of them as device interrupts, and arrange that its PALcode treated them all 
as IPL20 interrupts, using fixed vectors. See Section 2.3.3 for more details on interrupts. 


To aid pattern-driven chip testers, the irq_h pins may be driven synchronously with respect 
to the system clock. See chapter Chapter 6 for the setup and hold requirements of the irq_h 
pins with respect to the system clock for this case. 
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4.2.10 Electrical Level Configuration 


EVx can drive and receive either CMOS levels or 100K ECL levels (with assistance from 
resistors on the module). 


The vRef input supplies a reference voltage to the input sense circuits. If external logic ties 
this to VSS + 1.4V then all inputs sense TTL levels. If external logic ties this to VDD-1.3V 
(which can be obtained, for example, from the VBB output of an MC100E111) then all inputs 
sense ECL 100K levels. 


The eclOut_h input selects the output levels. If external logic ties this false then all outputs 
generate CMOS levels. If external logic ties this true then all outputs are switched into a 
mode in which external resistors can be used to generate ECL 100K compatible levels. 


4.2.11 Performance Monitoring 


The perf_cnt_h[1..0] pins provide a means of giving EV4’s internal performance monitoring 
hardware access to off-chip events. These pins are system clock synchronous inputs which 
may be selected via the ICCSR IPR to be inputs to the performance counters inside the EV4 
chip. If in a given system clock cycle a perf_cnt_h pin is sampled TRUE, and the pin is 
selected as the source of its respective performance counter, then the counter will increment. 


The perf_cnt_h[1..0] signals are unused in EV3. 


4.2.12 Tristate 


The tristate_| signal, if asserted, causes EV4 to float all of its output and bidirectional pins 
with the exception of cpuClkOut_h, and causes EV3 to float all of its output and bidirectional 
pins with the exception of cpuC]kOut_h, sysClkOuti_h, sysClkOut1_l, sysClkOut2_h and 
sysClkOut2_]. When tristate_l is asserted, EVx is forced into the reset state, but the irq_h 
pins are not resampled. 


4.2.13 Continuity 


The cont_] signal, if asserted, causes EVx to connect all of its pins to VSS, with the exception 
of clkIn_h, clkIn_l, testClkIn_h, testClkIn_l, cpuClkOut_h, sysC]kOutl_h, sysClkOut1_], 
sysC]kOut2_h, sysClkOut2_l, VREF and cont_l. 


4.3 64-Bit Mode 


EVx may be configured at reset to use a 64-bit wide external data bus, in which case data_ 
h({127..64] and check_h[{27..14] are not used. In EV4 these pins are internally pulled to VSS, 
while in EV3 they are left floating. 


The dataA_h([3] pin is used as an additional address line for the external cache data RAMs. 
Like the dataA_h[4] pin, it should drive a two input NOR gate, with the other input being 
driven by external logic. EVx drives dataA_h([3] false during reset, during external cache 
hold, and during any external cycle. 


The dWSel_h[0] pin should be used by external logic along with the dWSel_h[1] pin to select 
which quadword of a 32-byte block is driven onto data_h[63..0] during each system clock cycle 
of an external WRITE_BLOCK or STxC transaction. The relationship between dWSel_hl[1..0] 
and the selected bytes of the 32-block block is as follows: 
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Table 4—9: dWSel_h 
dWSel_h(1..0] Selected Bytes 


00 [07..00] 
01 [15..08] 
10 [23..16] 
ll [31..24] 


External logic must select quadwords in increasing order within the 32-byte block, but is free 
to skip over any quadword which does not have corresponding longword mask bits TRUE in 
cWMask_h{7..0]. 


Systems should ignore dataCEOE_h[(3..2] and dataWE_h(3..2]. 


External cache read hit transactions are extended to consist of four cache read cycles in 64-bit 
mode, where each cache read cycle is (BC_RD_SPD + 1) CPU cycles in duration. The first 
cache read cycle consists of a tag probe and data read, while the subsequent three cache 
read cycles consist of data reads. The EVx bus interface optimizes the external cache read 
hit transaction by wrapping cache read cycles around the quadword which EV<x originally 
requested. The dMapWE_h pin asserts 1 CPU cycle into the second cache read cycle and 
remains asserted until one CPU cycle before the end of the fourth cache read cycle. 


External cache write hit transactions consist of one cache tag probe cycle which is (BC_RD_ 
SPD + 1) CPU cycles long, followed by one, two, three or four external cache write cycles 
which are each (BC_WR_SPD + 1) cycles long. The EVx bus interface uses the minimum 
number of cache write cycles required to write the necessary longwords within the 32-byte 
block. 


Note that the maximum latency from holdReqg_h assertion to holdAck_h assertion in 64-bit 
mode is longer than in 128-bit mode. Also, the guarantee which external logic must make 
as to the availability of the external cache data RAMs Wace asserting tagOk is different for 
64-bit mode than for 128-bit mode. 


For external READ_BLOCK and LDxL transactions the EVx chip normally expects four 
distinct dRAck_h acknowledgment cycles. The first non-IDLE dRAck_h sample informs EVx 
to sample data bytes [7..0], the second to sample data bytes [15..8], and so on. Each quadword 
is parity/ECC checked based on the dRAck_h code supplied with that quadword. In EV3 
the dRAck_h code supplied with the fourth quadword determines whether the 32-byte block | 
is cached, while in EV4 the dRAck_h code supplied with the first quadword performs this 
function. 


4.4 Transactions 
4.4.1 Reset 


External logic resets EVx by asserting reset_]. When EVx detects the assertion of reset_] it 
terminates all external activity, and places the output signals on the external interface into 
the following state. Note that all of the control signals have been placed in the state that 
allows external access to the external cache. 
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Table 4-10: Reset State 

Pin State 
sRomOE_] 
sRomClk_h 
adr_h 
data_h 
check_h 
tagCEOE_h 
tagCtlWE_h 
tagCtlV_h 
tagCtlS_h 
tagCtlD_h 
tagCtlP_h 
dataCEOE_h 
dataWE_h 
dataA_h 
holdAck_h 
cReq_h<2:0> FFF 


ic) 
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After asserting reset_| for long enough to reset the serial ROM (100 ns), external logic negates 
reset_l. 


When EVx detects reset_! negate it may load bits from an external serial ROM into its internal 
Icache, based on the value placed on icMode_h[1..0]. The timing is shown below (assuming 
EVx only read 3 bits from the serial ROM): | 


reset. i. i <<*==<=== ee 

sRomOE_1 | ----~-—---—-----~----- | 
sRomClk_h ----| |----- | |----- , [—— 
Sample sRomD_h A A A 


Each half-tick of the sRomClk_h signal is 63 CPU cycles long, which guarantees the 200ns 
clock high and clock low specifications and the 400ns clock to data specification of the serial 
ROM with 5ns CPU cycles. 


Recall that it is possible to disable the serial ROM mechanism altogether - see Section 4.2.3. 
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4.4.2 Fast External Cache Read Hit 


A fast external cache read consists of a probe read (overlapped with the first data read), 
followed by the second data read if the probe hits. 


The following diagram assumes that the external cache is using 4 cycle reads (BC_RD_SPD 
= 3), 4 cycle writes (BC_WR_SPD = 3), chip enable control OE = L). 


Internal Clock lO |1 |2 {3 14 |5 [6  |7 f 
adr h | ma ; 
tagCEOE h | ---~------------ | 

tagCtlWE h 

tagAdr_h -ram-~ | 

tagCtl h -ram- | 

iMapWE_h, dMapWE_h | -------— | 
dataCEOE h | ---—------ -— ~~~ - ~~ Nl 
dataWE h 

dataA h[4] | -+---+-~--~--~-~ | 
data_h ~ram-0-| ~ram-1-| 
check_h ~ram-0-| ~ram-1-| 


If the probe misses then the cycle aborts at the end of clock 3. 


If the probe hits and the miss address had bit 4 set then the two data reads would have been 
swapped (dataA_h[4] would have been true in cycles 0, 1, 2, 3, and would have been false in 
cycles 4, 5, 6, 7). | 


4.4.3 Fast External Cache Write Hit 
A fast external cache write consists of a probe read, followed by 1 or 2 data writes. 


The following diagram assumes that the external cache is using 4 cycle reads (BC_RD_SPD 
= 3), 4 cycle writes (BC_WR_SPD = 3), chip enable control (OE = L), and a 2 cycle write pulse 
centered in the 4 cycle write (BC_WE_CTL[15..1] = LLLLLLLLLLLLLHH). 


Internal Clock lO |{21 |f2  J3 {14 |5 {6 |7 {8 {9  j{10 {11 | 
adr h | -----~ - - + - $e ee l 
tagCEOE h | ---——-—~—-----—--— | | --+---- | 

tagCtlWE h | ----—=— | 

tagAdr h ~ram- | 

tagCtl_h -ram-| | -cpu------- l 

dataCEOE h | ----=-— | | -----~-— | 
dataWE h | ---~--~ | | -<------ | 
dataA h[4] | -~—--~--+--~-~~=- | 
data_h P= CpuSOr ess Terr | -cpu-1--------- | 
check _h | -cpu-0--------— | -cpu-~1----~---- | 


Note that EVx drives the tagCtl_h pins one CPU cycle later than it drives the data_h and 
check_h pins relative to the start of the write cycle. This is because, unlike data_h and 
check_h, the tagCtl_h field must be read during the tag probe which proceeds the write cycle. 
Since EVx can switch its pins to a low impedance state much more quickly than most RAMs 
can switch their pins to a high impedance state, EVx waits one CPU cycle before driving the 
tagCtl_h pins in order to minimize tristate driver overlap. 


If the probe misses then the cycle aborts at the end of clock 3. 
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4.4.4 READ_BLOCK Transaction 


A READ_BLOCK transaction appears at the external interface on external cache read misses, 
either because it really was a miss, or because the external cache has not been enabled. 


sysClkOut Cycle | 0 | ar | 2 | 3 | 4 | 5 | 6 | 
sysClkOutl ho |---| fmt tenet bem teeed deere deme 
adr h | ----------- - - -- - +--+ | 

RAM Ctl —-—_—__ tree eeeer= | 

data_h (a0e= SS" | oe arias 

check _h | -O----- | | -1----- | 

cReq_ h | --~--~--------~-------------- -- -------- | 
cWMask_h | ---—------ -~--- - +--+ 

dRAck h | -----—— | | ------- | 

cAck_h | ---~---- | 


0. 


6. 


The cReq_h pins are always idle in the system clock cycle immediately before the 
beginning of an external transaction. The adr_h pins always change to their final value 
(with respect to a particular external transaction) at least one CPU cycle before the start 
of the transaction. 


The READ_BLOCK transaction begins. EVx has already placed the address of the 
block containing the miss on adr_h. EVx places the quadword-within-block and the I/D 
indication on cWMask_h. EVx places a READ_BLOCK command code on cReq_h. EVx 
will clear the RAM control pins (dataA_h[4..3], dataCEOE_h{3..0] and tagCEOE_h) no 
later than one CPU cycle after the system clock edge at which the transaction begins. 


The external logic obtains the first 16 bytes of data. Although a single stall cycle has 


_ been shown here, there could be no stall cycles, or many stall cycles. 


The external logic has the first 16 bytes of data. It places it on the data_h and check_h 
busses. It asserts dRAck_h to tell EVx that the data and check bit busses are valid. EVx 
detects dRAck_h at the end of this cycle, and reads in the first 16 bytes of data at the 
same time. : 


The external logic obtains the second 16 bytes of data. Although a single stall cycle has 
been shown here, there could be no stall cycles, or many stall cycles. 


The external logic has the second 16 bytes of data. It places it on the data_h and check_h 
busses. It asserts dRAck_h to tell EVx that the data and check bit busses are valid. EVx 
detects dRAck_h at the end of this cycle, and reads in the second 16 bytes of data at the 
same time. In addition, the external logic places an acknowledge code on cAck_h to tell 
EVx that the READ_BLOCK cycle is completed. EVx detects the acknowledge at the end 
of this cycle, and may change the address. 


Everything is idle. EVx could start a new external cache cycle at this time. 


Since external logic owns the RAMs by virtue of EVx having deasserted its RAM control 
signals at the beginning of the transaction, external logic may cache the data by asserting its 
write pulses on the external cache during cycles 3 and 5. 


EVx performs ECC checking and correction (or parity checking) on the data supplied to it via 
the data and check busses if so requested by the acknowledge code. It is not necessary to 
place data into the external cache to get checking and correction. 
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4.4.5 Write Block 


A WRITE_BLOCK transaction appears at the external interface on external cache write 
misses (either because it really was a miss, or because the external cache has not been 
enabled), or on external cache write hits to shared blocks. 


sysClkOut Cycle | 0 | 1 | 2 | 3 | 4 | o | 6 | 
sysClkOutl_h | [=--}fmn-} Inmet tenet bene dere dee-t 
adrih | ---——--- -- + - | 

RAM ctl —_- mre entree | | 

data_h | -0~-=~— | |-1----- | 

check _h | -0--~—-- | | -1----- | 

cReq h | ----——- - = + | 
cWMask_h | --~ l 

doOE 1 | -—--==-— | ----+—-— 

dwSel_ h | --~+----- l 

cAck ch | -+----— | 


0. The cReq_h pins are always idle in the system clock cycle immediately ae the 
beginning of an external transaction. The adr_h pins always change to their final value 


(with respect to a particular external transaction) at least one CPU cycle before the start 
of the transaction. 


1. The WRITE_BLOCK cycle begins. EVx has already placed the address of the block on 
adr_h. EVx places the longword valid masks on cWMask_h. EVx places a WRITE_ 
BLOCK command code on cReq_h. EVx will clear the RAM control pins (dataA_h[4..3], 
dataCEOE_h(3..0] and tagCEOE_h) no later than one CPU cycle after the system clock 
edge at which the transaction begins. 


2. The external logic detects the command and asserts dOE_] to tell EVx to drive the first 16 
bytes of the block onto the data bus. The timing shown for dOE_] is chosen for discussion 
purposes - external logic may in fact assert dOE_] by default and only deassert when it 
needs to read the data RAMs, such as when writing back a victim block. 


3. EVx drives the first 16 bytes of write data onto the data_h and check_h busses, and the 
external logic writes it into the destination. Although a single stall cycle has been shown 
here, there could be no stall cycles, or many stall cycles. 


4, The external logic asserts (OE_] and dWSel_h to tell EVx to drive the second 16 bytes of 
data onto the data bus. 


5. EVx drives the second 16 bytes of write data onto the data_h and check_h busses, and 

the external logic writes it into the destination. Although a single stall cycle has been 

_ shown here, there could be no stall cycles, or many stall cycles. In addition, the external 

logic places an acknowledge code on cAck_h to tell EVx that the WRITE_BLOCK cycle is 

completed. EVx detects the acknowledge at the end of this cycle, and changes the address 
and command to their next values. 


6. Everything is idle. EVx may start a new external cache access at this time. 


Since external logic owns the RAMs by virtue of EVx having deasserted its RAM control 
signals at the beginning of the transaction, external logic may cache the data by asserting its 
write pulses on the external cache during cycles 3 and 5. 


EVx performs ECC generation (or parity generation) on data it drives onto the data bus. 
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Although in the above diagram external logic cycles through both 128-bit chunks of potential 
write data, this need not always be the case. External logic must pull from the EVx chip only 
those 128-bit chunks of data which contain valid longwords as specified by the cWMask_h 
signals. The only requirement is that if both halves are pulled from EVx, then the lower half 
must be pulled before the upper half. 


4.4.6 LDxL Transaction 


An LDxL transaction appears at the external interface when an interlocked load instruction 
is executed. The external cache is not probed. With the exception of the command code output 
on the cReq pins, the LDxL transaction is exactly the same as a READ_BLOCK transaction. 
See section Section 4.4.4. 


4.4.7 STxC Transaction 


An STxC transaction appears at the external interface when a conditional store instruction 
is executed. The external cache is not probed. 


The STxC transaction is the same as the WRITE_BLOCK transaction, with the following 
exceptions: 


0. The code placed on the cReq pins is different. 
1. The cWMask field will never validate more than a single longword or quadword of data. 


2. External logic has the option of making the transaction fail by using the cAck code of 
STxC_FAIL. It may do so without asserting either d(OE_| or dWSel_h. 


- See section Section 4.4.5. 


4.4.8 BARRIER Transaction 


A BARRIER transaction appears on the external interface as a result of an MB instruction. 
The acknowledgment of the BARRIER transaction tells EVx that all invalidates have been 
supplied to it, and that any external write buffers have been pushed out to the coherence 
point. Any errors detected during these operations can be reported to EVx when the BARRIER 
transaction is acknowledged. 


sysClkOut Cycle | @) | 1 | 7) | 
sysClkOut h |---| |---| |--—|- l 
cReq h | ------—--------- | 
cAck h | ------— | 


0. The BARRIER transaction begins. EVx places the command code for BARRIER onto the 
cReq_h outputs. 


1. The external logic notices the BARRIER command, and since it has completed processing 


the command (it isn’t going to do anything), it places an acknowledge code on the cAck_h 
inputs. 


2. EVx detects the acknowledge on cAck_h, and removes the command. The external logic 
removes the acknowledge code from cAck_h. The cycle is finished. 
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4.4.9 FETCH Transaction 


A FETCH transaction appears on the external interface as a result of a FETCH instruction. 
The transaction supplies an address to the external logic, which may choose to ignore it, or 
use it as a memory-to-cache prefetching hint. 


sysClkOut Cycle | 0 | 1 | 2 | 3 | 
sysClkOut_h |--~ | |---| |---| | 

adr h | -------~--— -- ~~ | 

RAM ctl — secre nH--- | 

cReq_h | --------—------ | 

cAck_h | ----—-~— | 


0. The cReq_h pins are always idle in the system clock cycle immediately before the 
beginning of an external transaction. The adr_h pins always change to their final value 
(with respect to a particular external transaction) at least one CPU cycle before the start 
of the transaction. 


1. The FETCH transaction begins. EVx has already placed the effective address of the 

| FETCH on the address outputs. EVx places the command code for FETCH on the cReq_ 

h outputs. EVx will clear the RAM control pins (dataA_h[4..3], dataCEOE_h(3..0] and 

tagCEOE_h) no later than one CPU cycle after the system clock edge at which the 
transaction begins. 


2. The external logic notices the FETCH command, and since it has completed processing 
the command (it isn’t going to do anything), it places an acknowledge code on the cAck_h 
inputs. 


3. EVx detects the acknowledge on cAck_h, and removes the address and the command. The 
‘external logic removes the acknowledge code from cAck_h. The cycle is finished. 


4.4.10 FETCHM Transaction 


A FETCHM transaction appears on the external interface as a result of a FETCHM instruc- 
tion. The transaction supplies an address to the external logic, which may choose to ignore 
it, or use it as a memory-to-cache prefetching hint. With the exception of the command code 
placed on cReq_h, the FETCHM transaction is the same as the FETCH transaction. See 
section Section 4.4.9. 
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Chapter 5 
DC Characteristics 


5.1 Overview 


EV3 and EV4 are capable of running in a CMOS/TTL environment or an ECL environment. 
The chips will be tested and characterized in a CMOS environment. The specifications 
below assume a CMOS/TTL environment. Differences for an ECL environment are noted 
in Section 5.2. 


5.1.1 Power Supply 


In CMOS mode the VSS pins are connected to 0.0V, and the VDD pins are connected to 3.3V, 
+/- 5%. | 


To prevent damage to EV4, it is important that the 3.3V power supply be stable before any 
of EV4’s input or bidirectional pins be allowed to rise above 4.0V. System designers should | 
note that this is exactly opposite to the rule used by 5.0V inputs in CMOS-3, so care should 
be taken when "borrowing" power supplies from CMOS-3 systems. 


To help in meeting this requirement, the assertion levels of EV4’s input pins have been 
arranged so that their default state is the electrical low state. This makes them active high, 
with the exception of tagOk_l and dOE_]I, which are true by default. 


5.1.2 Reference Supply 
The vRef analog input should be connected to a 1.4V +/-10% reference supply. 


5.1.3 Input Clocks 


_clkIn (_h,_]) is expected to be a differential signal generated from an ECL oscillator circuit, 
although non-ECL circuits may also be used. It may be AC coupled, with a nominal DC bias 
of VDD/2 set by a high-impedence (i.e. >1K) resistive network on chip. It need not be AC 
coupled if VDD is used as the VCC supply to the ECL oscillator. See the AC Characteristics 
chapter for more detail. 
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5.1.4 Signal pins 


Input pins are ordinary CMOS inputs with standard TTL levels, see Table 5-1. Once power 
has been applied and vRef has met its hold time, the majority of input pins can be driven 
by 5.0V (nominal) signals without harming EV4. There are some signals that are sampled 
before vRef is stable, and these signals can not be driven above the power supply. These 
signals are: 


¢ dcOk_h 

¢ tristate_] 
¢ cont_] 

¢ eclOut_h 


Output pins are ordinary 3.3V CMOS outputs. Although output signals are rail-to-rail, timing 
is specified to standard TTL levels, see Table 5—1. 


Bidirectional pins are ordinary 3.3V CMOS bidirectional. On input, they act like input pins. 
On output, they drive like output pins. 


Once power has been applied, input (except noted above) and bidirectional pins can be driven 
to a maximum DC voltage of 5.5V without Bareune EV4 is not necessary to use static 
RAMS with 3.3V outputs). 


Table 5—1: CMOS DC Characteristics 


Parameter Requirements 


Symbol Description Min Max Units Test Conditions 
TTL Inputs/Outputs 


ii 


Vih High level input voltage 2.0 V 
Vil Low level input voltage 0.8 V 
Voh High level output voltage 2.4 | V Ioh = -100uA 
Vol Low level output voltage 0.4 Vv Tol = 3.2mA 
Power/Leakage 
Icin Clock input Leakage -50 50 uA -0.5<Vin<5.5V 
Input leakage current 10 10 uA 0<Vin<Vdd V 
Iol Output leakage current -10 -10 uA 
(three-state) 
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5.2 ECL 100K Mode 


In ECL 100K mode a combination of on-chip and off-chip circuits provide ECL 100K compat- 
ible interfaces. 


5.2.1 Power Supply 


In ECL 100K mode the VDD pins are connected to 0.0V, and the VSS pins are connected to 
-3.3V, +/- 5%. 


5.2.2 Reference Supply 
In ECL 100K mode the vRef input is connected to a reference supply at VDD-1.3V. The best 


way to generate the reference supply is to use the VBB output provided by several chips, such 
as the ECLinPS MC100E111. 


5.2.3 Inputs 


In ECL 100K mode inputs appear to be ordinary ECL 100K inputs, with the exception that 
they lack the pull down resistor that is normally present in ECL 100K circuits. 


5.2.4 Outputs 


In ECL 100K mode external resistors create the correct ECL 100K levels. The following 
stylized circuit is used. 


| Tare | 
CPU |------ [RL | eee | ECL 100K 
| +---+ | | 
| 50 ohms +—+ | | 
[R| 
}2! 100 ohms 
+=+ 
| 
’ a ae 
-2.0V 


5.2.5 Bidirectionals 


In ECL 100K mode the bidirectional pins should be converted into unidirectional input and 
output busses as close to EV4 as possible. The EV4 chip bidirectional bus is buffered and 
driven onto the system output bus. The system input bus is driven onto EV4’s bidirectional 
bus using cut-off drivers controlled by the CPU’s output enables. 


The same resistor network used on output pins is used on bidirectional pins. 
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5.3 Power Dissipation 


A comprehensive power dissipation analysis consisting of both analytic and empirical tech- 
niques was performed on EV3. Once a program that caused maximum EV3 dynamic power 
dissipation was identified, it was run on the logic simulator and using analysis tools, EV4 
pee dissipation was analytically predicted. The results from that analysis are shown in 
Table 5—2. 


Table 5-2: EV4 Power Dissipation @Vdd=3.45V 


Speed Min Typ Max Units 
5.0ns 24 - 29.5 36 Watts 
6.6ns 19 93 27.5 Watts 


The minimum power occurs during reset, the Typ column is the worst case average program 
and the Max column is the worst case pathological program. An important observation is the 
fact that all normal programs observed to date (both stand-alone and under ULTRIX) run in 
a range between Min and Typ. So while the pathological case is theoretically possible, it is 
extremely unlikely in practice. The following approach is recommended for system designers: 


¢ Design the EV4 heat sink and thermal environment to keep the die temperature to 85C 
for the Typ power case. This is certainly the limiting case for average power dissipation 
to be used for long term reliability assessment. With Typ designed for 85C, then in all 
cases, Max will result in die temperatures under the 100C design, test and process qual 
limits. 


¢ Design the overall enclosure and the power supply to handle the Max power case. 


As a further refinement, it is possible to account for applications where the maximum supply 
voltage is other than 3.45V and/or the operating frequency is not 150 or 200MHz. The 
formulae for calculating Idd under various conditions are as follows: 


Idd (Min) = 116mA/V + 9.6mA/V*MHz 
Idd (Typ) = 116mA/V + 11.7mA/V*MHz 
Idd (Max) = 116mA/V + 14.4mA/V*MHz 
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Chapter 6 
_AC Characteristics 


6.1 


This chapter contains the AC specification for EV4. Timing parameters are given for the 
nominal speed binned (6.6ns) parts. 


vRef 


vRef is an analog reference voltage used by the input buffers of all signals except clkIn_h,_], 
testclkIn_h,_1, tagOk_h,_l, deOk_h, eclOut_h, tristate_], and cont_l. Note that upon power 
up, reset_l cannot be sampled until vRef is stable. There is a large internal capacitance on 
vRef, and an RC delay between its pin and the input buffers. Therefore, systems must not 
assert dcOk_h until a suitable interval following the stability of the vRef source. This interval 
is specified as the greater of lus and 10nF * Zout, where Zout is the vRef source impedance. 


6.2 Input Clocks 


The input clocks clkIn_h,_! and testclkIn_h,_1 are received differentially, then XORed to 
provide the time-base for EV4 when dcOk_h is asserted. We expect testclkIn_h,_] to be used 
only by testers unable to drive clkIn_h,_| at full speed. The terminations on these signals are 
designed to be compatible with system oscillators of arbitrary DC bias. Schematically, they 
look as follows: 
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| PIN | eme--te----| PAD | ----------- ne ea ee ee > (to diff—-amp) | 
+---——— + | +-<--<- + | 
| | 
| | 50o0hms Hi 2 
--—<—— Cpkg +-——--RRRR~--—-+---~-RRRR----+ 
one | | 
Ba | 
Prem 40pF | 
Vbias = (Vdd-Vss) /2 | | 
faa ee en en ee en en ee + + 


This is designed to approximate a 50ohm termination for the purpose of impedance matching 
for those systems (if any) which drive input clocks across long traces. Furthermore, the high 
impedance bias driver allows a clock source of arbitrary DC bias to be AC coupled to EV4. 
The peak-to-peak amplitude of the clock source must be between 0.6V and 3.0V as seen by 
EV4. Hither a "square-wave" or a sinusoidal source may be used. Note that full-rail clocks 
may be driven by testers. : 


The following table lists the input clock cycle times for the various EV4 bin speeds. Note that 
the these periods equal one-half the corresponding cpu cycle times. 


Table 6-1: Input Clock Timing 


Name - Fast Bin Nominal Bin Slow Bin Unit 
clkIn period min 2.5 3.3 5.0 - nS 
clkIn period max tbd thd ~ thd nS 
clkIn symmetry 50%+/-10% 50%+/-10% 50%+/-10% percent 


6.3 cpuCikOut_h 


The cpuClkOut_h signal is expected to be used only by an ECL synchronizer in systems using 
the tagOk protocol. In order to accommodate ECL levels, the driver consists of only a PMOS 
pullup device. ECL 100K levels may be constructed with a 50ohm board resistor in series 
with the driver and a 100chm board resistor between the load and (Vdd - 2V). CMOS Vdd 
must equal ECL Vcc in this scheme. Note that the trace must be short to insure good signal 
integrity if, as expected, the board impedance is not in the vicinity of 100ohm. 


6-2 AC Characteristics 





6.4 Test Configuration 


All outputs and bidirectional signals including clocks but excluding cpuClkOut_h are specified 
with respect to a standard 40pF load as shown below. All timing is specified with respect to 
the crossing of standard TTL input levels at 0.8V and 2.0V. 


EV4 |----~-------- + 
PIN | | 
| --—- 40pf 
| 
a ao 
GND 


6.5 Fast Cycles on External Cache 


From a system standpoint, fast cycles on the external cache are completely unclocked. The 
two cases of read and write cycles require separate treatment. 


6.5.1 Fast Read Cycles 


External logic must meet the maximum flow-through delay, as defined with respect to the 
circuits below. 


| Address Address +-—-—--—---—-— + 

EV4 | ------------- + EV4 | -e3rcererce----- | | 
PIN | Control | PIN | Control | 
| --- 40pf | | External | 

= | | 

| | Logic | 

—+— | Data | | 

GND EV4 | ---r3-rre- | | 

PIN | +--—------—— + 


"Address" refers to adr_h and dataA_h. "Control" refers to dataCEOE_h and tagCEOE_h. 
"Data" refers to data_h, check_h, tagAdr_h, and tagCtl_h. Assume that address/control is 
driven from the same EV4 internal clock edge in the two cases above. External flow-through 
delay is defined as the delay between address/control valid to the 40pF standard load in the 
left-hand case and data valid to EV4 in the right-hand case. It may not exceed the fast read 
cycle time (i.e. BC_RD_SPD+1 cpu cycles) less 5.0ns. EV4 guarantees that its address drivers 
are enabled at least one cpu cycle prior to a fast cache access, such that adr_h need never be 
pulled down from 5V during the cycle. 
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6.5.2 Fast Write Cycles 


External logic must guarantee that fast writes complete. Data, address, and control (including 
dataWE_h and tagCtlWE_h) are driven by EV4 with identical timing from its internal clock. 
Actual pulse widths are at least the nominal width less 1.5ns, or 2.9ns on lines precharged 
to 5V (i.e. data lines following a probe read). The timing of dMapWE_h during dcache read 
hits is specified in the same way. 


6.6 External Cycles 


All external cycle timing is referenced to the rising edge of sysClkOutl_h. Input setup 
and hold times and output delay and enable times are referenced to the point at which 
sysClkOut1_h crosses 0.8V. (Output enable time is defined as output delay time from a tri- 
stated state. It may differ from the nominal delay because it may entail pulling down from a 
5V level.) Output hold times are referenced to the point at which sysC]kOut1_h crosses 2.0V. 
They denote the times beyond sysClkOut1_h for which outputs hold their valid values from 
the previous cycle. Note that these times are negative, meaning that data may lose validity 
BEFORE sysClkOut1_h becomes valid high. (This is possible because there is no cause-effect 
relationship between the system clock outputs and data. In fact, the system clock outputs 
are nothing more than data pins which happen to switch in a fixed pattern.) Address enable 
timing is relevant only for systems using the holdReq protocol with two cpu cycles per system 
cycle. All bidirectional lines may be considered enable or disabled simultaneously with the 
rising edge of sysClkOuti_h. 
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Table 6-2: External Cycles 
Name Min Max Units 
Enable, sysC]kOut1_h to 





adr_h, data_h, check_h 2.9 ns 


Output Delay, sysClkOut1_h to 

adr_h, data_h, check_h, cReq_h, cWMask_h, 1.5 ns 
holdAck_h 

Output hold, sysClkOut1_h to 

adr_h, data_h, check_h, cReq_h, cWMask_h, -1.5 ns 
holdAck_h 


Input Setup relative to sysCIkOutl_h 


dRAck_h, dWSel_h, dOE_1 9.3 nS 
cAck_h Teyc/2 + 6.0 nS 
holdReq_h 4.8 ns 
dInvReq_h, iAdr_h 4.5 nS 


data_h, check_h 3.5 ns 


Input Hold relative to sysC]kOutl_h ; 
cAck_h, dRAck_h, dWSel_h, dOE_1 0 ns 


data_h, check_h 0 nS 
holdReq_h, dInvReq_h, iAdr_h 0 ns 


The cAck_h input setup time is a function of the chip cycle time(Tcyc). At the nominal 6.6nS 
cycle time, required setup on the cAck_h pin is 9.3nS. 


6.7 tagEq 


When active during external cache hold, the timing of tagEq_]l is specified from when its 
inputs become valid at the EV4 pins. , 
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Table 6-3: tagEq 


Name Min Max Units 

Delay, adr_h -> tagEq_] 17.0 nS 

Delay, tagAdr_h -> tagEq_] ——=17.0 ns 
6.8 tagOk 


The tagOk_h,_] signals are expected to be driven to EV4 directly from the final stage of an 
ECL synchronizer clocked by cpuCIkOut_h. As in the case of fast external cache cycles, the 
system must meet a maximum flow-through delay. This delay is defined with respect to the 
circuits below. 


| cpuClkOut_h | epuClkOut_h +----<-<--- + 

EV4 | --~--RRRR----+------ + BV4 | were rere | | 
PIN | 500ohms | {| lLOpF PIN | | | 
| | === | | External | 

Bo era | | | 

Vdd-2 .0V | | | Logic | 
Q--=-—-RRRR---<—+ V | tagOk_h, 1 | | 
100chms EV4 |---~-------~--+~+ | | 

PIN | S baeeeteetentonontanbenteetann + 


Assume that cpuC]lkOut_h is driven from the same EV4 internal clock edge in the two cases 
above. External flow-through delay is defined as the delay between cpuClkOut_h valid to the 
10pF ECL "standard" load in the left-hand case and tagOK_h,_| valid to EV4 in the right- 
hand case. It may not exceed the nominal cpu cycle time less 3.9ns. Note that board resistors 
must be part of "external logic" in the circuit on the right. For purposes of this specification, 
cpuClkOut_h is considered valid when it crosses the ECL threshold "Vbb" (equal to roughly 
Vcc - 1.3V) and tagOk is considered valid when the differential lines cross each other. 


6.9 Tester Considerations 


6.9.1 Asynchronous Inputs 


The signals reset_l, irq_h, and sRomD_h (in serial port mode) are asynchronous during normal 
system operation. However, for test purposes they should be driven synchronously with 
sysClkOut1_h with the timing given below. Note once again that these parameters are given 
with respect to the time at which the rising edge of sysClkOut1_h crosses 0.8V. , 
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Table 6—4: Asynchronous Signals on a Tester 


Name Min Max Units 
Setup, reset_] -> sysClkOuti_h 5.0 ns 
Setup, irq_h -> sysClkOut1_h 5.0 ns 
Hold, irgq_h -> sysClkOut1_h 0 ns 
Setup, sRomD_h -> sysClkOutl_h 5.0 nS 
Hold, sRomD_h -> sysClkOuti_h 0 ns 


6.9.2 Signals Timed from Cpu Clock 


Due to the speed of EV4, it is expected that at-speed testing will be done with tester cycle 
equal to system cycle (i.e. sysClkOut1_h). However, fast external cache operation and serial 
ROM operation are timed from internal cpu clock. Therefore, input sampling and output 
enabling and switching may occur at different time points within a tester cycle from one cycle 
to the next. Fortunately, the number of such points is finite, equal to the number of cpu cycles 
per tester cycle. For any given transaction, each signal will have its standard external cycle 
timing with respect to the rising edge of sysC]kOuti_h OR to a "phantom" edge offset from 
sysClkOut1_h by exactly an integer number of cpu cycles. (Note that dataA_h, dataCEOE_h, 
dataWE_h, tagCEOE_h, tagCtlWE_h, and dMapWE_h have the same delay timing as adr_ 
h.) Therefore, outputs may be sampled deterministically with appropriate placement of the 
tester strobe and inputs may be received deterministically with appropriate placement of the 
drive edge. Bidirectional signals present a different problem. Because the tester can enable 
or disable a given driver at just one point within its cycle, it must in the worst case drive an 
input beyond its EV4 sample point by at least (N-1) cpu cycles, where N is the number of cpu 
cycles per system cycle. However, in the worst case EV4 will enable its drivers just one cpu 
cycle after sampling (for example, tagCtl_h following probe write). Therefore, the number of 
cpu cycles per system cycle must not exceed two to avoid driver conflict between EV4 and the 
tester. 


The serial ROM outputs sRomOE_] and sRomClk_h may be strobed with the same timing as 
the data_h pins when driven by EV4. The serial ROM input sRomD_h may be switched with 
the same timing used in serial port mode. 


6.10 Scaling for EV3 


Prototype systems using EV3 must make allowance for the use of CMOS-3 technology by 
scaling all timing parameters (except those on vRef) by a factor of 1.5. Systems which use 
the holdReq protocol with 5V address drivers are further constrained to keep "flow-through 
delay” on fast cache read cycles less than the nominal fast read cycle time less 9.6ns. In 
addition, one-half a cpu cycle must be added to the maximum delay between tagAdr_h and 
tagEq_| due to an internal latch on the tagAdr_h inputs to the tagEq_! comparator. 
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Chapter 7 
Package 


EV3 and EV4 packages are pin compatible. Figure 7-1 shows pin locations for both EVx 
chips. Pin numbers are compatible with EVx bodies used in the Artemis/Lyre CAD system. 
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Figure 7-1: PGA Cavity Up View 
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Chapter 8 
Pinout 


8.1 Overview 


This chapter contains the entire EV4 pinout ordered by PGA location. In addition, it contains 
a list of differences between the EV4 pinout and the NVAX+ pinout. 


8.2 Change History 


Name Date Comment 
rrh 10-sep-90 First released this format 
rrh 19-sep-90 Pin R23, tagAdr<17>, type was changed from 


Bto I. The B was a typo and the change 
does not represent a functional change. 


ejm ~22=apr-91 The following is a list of changed signals: 
EV4 SIGNAL EV3 SIGNAL CHANGE 
icMode_ h<0> sromFast name chango only 
icMode_h<1> scan h<2> new I formerly 0 
perf cnt_h<0> perf h<1l> new I formerly O 
perf _cnt_h<l> perf h<2> new I formerly O 
spare<4> scan h<3> new N formerly O 
spare<5> scan h<O> new N formerly O 
spare<6> scan h<i> new N formerly O 
spare<7> perf h<3> new N formerly O 
spare<8> perf h<0> new N formerly O 

ejm 30-apr-91 remove unused SIG No., replace with 


Pin No. compatible with Artemis/Lyre files. 
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8.3 EV4 Pinout 


PGA 
LOC. 


Al 
A2 
A3 
A4 
AS 
A6 
A7 
A8 
A9 
A10 
All 
Al2 
Al3 
Al4 
Al5 
Al6 
Al7 
A18 
Al9 
A20 
‘ A21 
A22 
A23 
A24 


B14 
BLS 
B16 
B17 
B18 
B19 
B20 
B21 
B22 
B23 
B24 


8-2 Pinout 


PAD 
No. 


009 
008 
004 
426 
421 
418 
412 
407 
403 
398 
391 
387 
386 
379 
373 
367 
364 
358 
355 
349 
347 
343 
340 
Sod 


014 
046 
003 
039 
424 
054 
413 
047 
404 
062 
394 
055 
383 
070 
372 
063 
363 
078 
354 
O71 
346 
086 
079 
335 


PIN 
No. 


O01 
002 
003 
004 
005 
006 
007 
008 
009 
010 
O11 
012 
013 
014 
015 
016 
O17 
018 
019 
020 


021 


O22 
023 
024 


025 
026 
027 
028 
029 
030 
031 
032 
033 
034 
035 
036 
037 
038 
039 
040 
O41 
042 
043 
044 
045 
046 
047 
048 


TYPE NAME 
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data_h<33> 
data_h<97> 
data_h<98> 
data_h<100> 
data_h<38> 
check_h<27> 
data_h<104> 
data_h<42> 
data_h<44> 
data_h<109> 
data_h<47> 
data_h<49> 


data_h<113> 


data_h<52> 
check h<12> 
data_h<55> 
data_h<120> 
data_h<122> 
check_h<7> 
data_h<60> 
data_h<61> 
data_h<62> 
data _h<127> 
check _h<9> 


check_h<15> 
VDD plane 
data_h<35> 
VSS plane 
data_h<101> 
VDD plane 
data_h<40> 
VSS plane 
data_h<107> 
VDD plane 
data _h<110> 
VSS plane 
data_h<50> 
VDD plane 
check_h<26> 
VSS plane 
data_h<57> 
VDD plane 
check _h<21> 
VSS plane 
data_h<125> 
VDD plane 
VSS plane 
check h<8> 





Cl 
C2 
C3 
C4 
CS 
C6 
C7 
C8 
C9 
C10 
C11 
C12 
Ci3 
C14 
C15 
C16 
C17 
C18 
C19 
C20 
C21 
C22 
C23 
C24 


D1 
D2 
D3 
D4 
DS 
D6 
D7 
D8 
D9 
Did 
Dil 
Di2 
D13 


D14 


D15 
D16 
D17 
D18 
D19 
D20 
D21 
D22 
D23 
D24 


016 
119 
010 
002 
425 
419 
414 
410 
405 
399 
395 
388 
382 
378 
oy Bl 
366 
362 
357 
Soi 
348 
342 
336 
330 
331 


022 
O17 
O15 
005 
427 
420 
415 
411 
406 
402 
396 
389 
381 
375 
370 
365 
359 
356 
350 
341 


334. 


328 
152 
325 


049 
050 
O51 
052 
053 
054 
055 
056 
057 
058 
059 
060 
061 
062 
063 
064 
065 
066 
067 
068 
069 
070 
071 
072 


073 
074 
075 
076 
077 
078 
079 
080 
081 
082 
083 
084 
085 
086 
087 
088 
089 
090 
O91 
092 
093 
094 
095 
096 
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check_h<16> 
VSS plane 
data _h<96> 
data h<99> 
data_h<37> 
check_h<13> 
data _h<103> 
data_h<105> 
data _h<43> 
data h<45> 
data_h<46> 
data _h<112> 
data_h<114> 
data_h<il6é> 
data_h<54> 
data_h<119> 
data_h<121> 
check h<11> 
data_h<59> 
data_h<124> 
data_h<126> 
check h<23> 
GRAck h<0> 
spare<3> 


data_h<94> 
check h<2> 
check h<1l> 
data_h<34> 
data_h<36> 
data_h<102> 
data_h<39> 
data h<41> 
data_h<106> 


data_h<108>_ 


check h<24> 
data_h<48> 
data h<51> 
data h<53> 
data _h<118> 
data_h<56> 
data_h<58> 
check h<25> 
data _h<123> 
data_h<63> 
check h<22> 
dRAck h<2> 
VDD plane 
doE 1 
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El 
E2 
B3 
E4 
E5 
E6 
7 
E8 
Bg 
B10 
Ell 
E12 
E13 
E14 
E15 
B16 
E17 
E18 
E19 
E20 
E21 
E22 
E23 
E24 


Fl 
F2 
F3 
F4 
ES 
F6 
F7 
F8 
F9 
F10 
Fill 
F12 
F13 
F14 
F15 
F16 
F17 
F18 
F19 
F20 
F21 
F22 
F23 
F24 


8-4 Pinout 


023 
126 
O21 
011 
226 
235 
234 
243 
242 
255 
397 
390 
380 
374 
266 
279 
278 
291 
290 
303 
329 
324 
323 
322 


028 
027 
026 
020 
231 
230 
239 
238 
249 
248 
261 
254 
267 
260 
273 
272 
285 
284 
297 
296 
319 
318 
ies Fo) 
317 


097 
098 
099 
100 
101 
102 
103 
104 
105 
106 
107 
108 
109 
110 
oe | 
112 
i13 
114 
115 
116 
117 
118 
119 
120 


121 
122 
123 
124 
125 
126 
127 
128 
129 


130 - 


131 
132 
133 
134 
135 
136 
L37 
138 
139 
140 
141 
142 
143 
144 
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data_h<30> 
VDD plane . 
data_h<31> 
data_h<32> 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
check_h<10> 
data_h<111> 
data_h<115> 
data_h<117> 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
dRAck_ h<1l> 
dWSel_h<0> 
dWSel_ h<1> 
cAck_h<0> 


data_h<92> 
data_h<29> 
data_h<93> 
data _h<95> 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane | 
cAck_ h<l1> 
cAck_h<2> 
VSS plane 
holdReq_h 

















Gl 
G2 
G3 
G4 
GS 
G6 
G19 
G20 
G21 
G22 
G23 
G24 


H1 
H2 
H3 
H4 
HS 
H6 
H19 
H20 
H21 
H22 
H23 
H24 


J1 
J2 
J3 
J4 
JS 
J6 
J1i9 
J20 
J21 
J22 
J23 
J24 


Kl 
K2 
Ks 
K4 
KS 
K6 
K1i9 
K20 
K21 
K22 
K23 
K24 


033 
111 
032 
029 
360 
369 
N/A 


N/A’ 


316 
eS 
312 
Skt 


037 
036 
035 
034 
361 
352 
N/A 
428 
310 
307 
142 
306 


042 
118 
041 
040 
344 
303 
422 
N/A 
305 
304 
301 
300 


048 
045 
044 
043 
345 
338 
423 
416 
299 
298 
147 
295 


145 
146 
147 
148 
149 
150 
LoL 
152 
153 
154 
[55 
156 


157 
158 
159 
160 
161 
162 
163 
164 
165 
166 
167 
168 


169 
170 
171 
172 
173 
174 
175 
176 
177 
L7S 
179 
180 


181 
LBZ 
183 
184 
185 
186 
187 
188 
189 
190 
POL 
192 
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data_h<27> 
VSS plane 
data h<91> 
data h<28> 
VDD plane 

VSS plane 

VDD plane 

VSS plane 
holdAck h 
dataCEOE h<0> 
dataCEOE h<1> 
dataCEOE h<2> 


check _h<4> 
check_h<18> 
check_h<0> 
check h<14> 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
dataCEOE h<3> 
tagCtlWE h 
VDD plane 
cWMask_ h<0> 


data_h<89> 
VDD plane 
data _h<26> 
data_h<90> 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
cWMask_h<1> 
cWMask_h<2> 
cWMask h<3> 
cWMask h<4> 


data_h<87> 
data h<24> 
data _h<88> 
data_h<25> 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
cWMask h<5> 
cWMask h<6> 
VSS plane 
cWMask_ h<7> 


Pinout 


8-5 


Ll 
L2 
L3 
L4 
L5 
L6 
L19 
L20 
L21 
L22 
L23 
L24 


M1 
M2 
M3 
M4 
MS 
M6 
M19 
M20 
M21 
M22 
M23 
M24 


N1 
N2 
N3 
- N4 
N5 
N6 
N19 
N20 
N21 
N22 
N23 
N24 


Pl 
P2 
P3 
P4 
PS 
P6 
P19 
P20 
P21 
P22 
P23 
P24 


8-6 Pinout 


052 
103 
O51 
050 
049 
339 
408 
294 
293 
292 
289 
288 


059 
058 
057 
056 
053 
332 
417 
287 
286 
283 
140 
282 


060 
110 
061 
064 
065 
333 
400 
279 
276 
277 
280 
281 


066 
067 
068 
069 
072 
326 
409 
269 
270 
271 
145 
274 


193 
194 
195 
196 
197 
198 
199 
200 
201 
202 
203 
204 


205 
206 
207 
208 
209 
210 
211 
212 
213 
214 
215 
216 


244 
218 
219 
220 
221 
222 
223 
224 
225 
226 
227 
228 


229 
230 
231 
232 
233 
234 
235 
236 
231 
238 
239 
240 
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check _h<19> 


VSS plane 


data_h<22> 
data_h<86> 
data_h<23> 
VSS plane 
VDD plane 
dataWE_h<0> 
dataWE_h<1> 
dataWE_h<2> 
dataWE_h<3> 
dMapWE_h 


data_h<20> 
data_h<84> 
data_h<21> 
data_h<85> 
check_h<5> 
VDD plane 
VSS plane 
cReq h<0> 
cReq_ h<1> 
cReq h<2> 
VDD plane 
spare<0> 


data_h<83> 
VDD plane 
data_h<19> 
data_h<82> 
data_h<18> 
VSS plane 
VDD plane 
tagOk_1 
tagOk_h 
dataA_h<4> 
dataA_h<3> 
tagCEOE h 


data_h<81> 
data_h<17> 
data_h<80> 
data_h<16> 
data_h<79> 
VDD plane 

VSS plane 

tagCtls h 

tagCtlD h 
tagCt1P h 

VSS plane 

tagkq 1 





R1 
R2 
R3 
R4 
RS 
R6 
R19 
R20 
R21 
R22 
R23 
R24 


y hee 
EZ 
T3 
T4 
TS 
T6 
T19 
T20 
T21 
T22 
tes 
T24 


Ul 


U2 
U3 
U4 
US 
U6 
U19 
U20 
U21 
U22 
U23 
U24 


Wl 
W2 
W3 
w4 
Ww5 
W6 
W19 
w20 
W21 
W22 
W23 
W24 


073 
095 
074 
075 
320 
327 
392 
401 
263 
264 
265 
268 


076 
077 
080 
081 
321 
314 
393 
384 
258 
259 
1338 
262 


082 
102 


083 


084 
308 
315 
376 
385 
252 
203 
256 
257 


085 
088 
089 
090 
309 
302 
377 
368 
247 
250 
143 
251 


241 
242 
243 
244 
245 
246 
247 
248 
249 
250 
251 
252 


253 
254 
255 
256 
257 
258 
259 
260 
261 
262 
263 
264 


265 
266 
267 
268 
269 
270 
271 
272 
273 
274 
215 
276 


es 
278 
279 


280 


281 
282 
283 
284 
285 
286 
287 
288 
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data_h<15> 
VSS plane 
data h<78> 
data_h<14> 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
tagadr_h<19> 
tagadr_ h<18> 
tagadr_ h<17> 
tagCtlv_h 


check _h<17> 
check _h<3> 
data _h<77> 
data _h<13> 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
tagadr h<22> 
tagadr_ h<21> 
VDD plane 
tagadr h<20> 


data_h<76> 
VDD plane 
data_h<12> 
data_h<75> 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
tagadr_h<26> 
tagadr_ h<25> 
tagadr_h<24> 
tagadr h<23> 


data_h<11> 
data_h<74> 
data h<10> 
data h<73> 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
tagadr_h<29> 
tagadr_ h<28> 
VSS plane 
tagadr_ h<27> 


Pinout 


8-7 


X1 
X2 
X3 
x4 
X5 
X6 
X7 
X8 
X9 
X10 
X11 
X12 
X13 
X14 
X15 
X16 
X17 
X18 
X19 
X20 
X21 
X22 
X23 
X24 


Y1 
Y2 
Y3 
Y4 
Y5 
Y6 
Y7 
Y8 
Yo 
Y10 
Y11 
Y12 
Y13 
Y14 
Y15 
Y16 
“iq 
Y18 
Y19 
Y20 
Y21 
Y22 
Y23 
Y24 


8-8 Pinout 


O91 
087 
092 
099 
154 
163 
168 
LES 
139 
141 
180 
167 
169 
199 
198 
211 
210 
219 
218 
227 
240 
244 


245 


246 


093 
096 
097 
106 
161 
166 
165 
170 
181 
174 
187 
186 
193 
192 
205 
204 
215 
214 
223 
222 
232 
237 
t32 
241 


289 


290 
291 
292 
293 
294 
295 
296 


297 


298 
299 
300 


301° 


302 
303 
304 
305 
306 
307 
308 
309 
310 
311 
oiZ 


Sis 
314 
315 
316 
317 
318 
319 
320 
a2. 
322 
323 
324 
325 
326 
327 
328 
329 
330 
331 
332 
333 
334 
339 
336 
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data_h<9> 
VSS plane 
data_h<72> 
check_h<6> 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
testClkin_h 
testClkin 1 
VDD plane 
clkIn_h 
clkIn_ 1 

VSS plane 
VDD plane 
VSS plane 
VDD plane 


VSS plane 


VDD plane 
VSS plane 
tagadrP_h 
tagadr_ h<32> 
tagadr_h<31> 
tagadr_ h<30> 


data_h<8>. 
data_h<71> 
data_h<7> 
data_h<68> 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
VSS plane 
VDD plane 
adr_h<8> 
adr _h<5> 
VDD plane 
tagadr_h<33> 





AAI 
AAZ2 
AA3 
AA4 
AAS 
AA6 
AAT 
AA8& 
AAS 
AA10 
AA11 
AA12 
AA13 
AA14 
AA15 
AA16 
AA17 
AA18 
AA19 
AA20 
AA21 
AA22 
AA23 
AAZ24 


AB1 
AB2 
AB3 
AB4 
AB5 
AB6 
AB7 
AB8 
AB9 
AB10 
AB11 
AB12 
AB13 
AB14 
AB1L5 
AB16 
AB17 
AB18 
AB19 
AB20 
AB21 
AB22 
AB23 
AB24 


098 
094 
105 
112 
se 
LoL 
125 
136 
144 
146 
Lod 
162 
164 
171 
182 
188 
191 
197 
202 
213 
217 
225 
233 


236 


100 
104 
108 
113 
116 
122 
129 
137 
148 
149 
133 
159 
160 
L712 
179 
185 
190 
196 
201 
207 
212 
220 
127 
229 


337 
338 
339 
340 
341 
342 
343 
344 
345 
346 
347 
348 
349 
350 
351 
392 
353 
354 
J00 
3956 
397 
358 
359 
360 


361 
362 
363 
364 
365 
366 
367 
368 
369 
370 
371 
372 
313 
374 
375 
376 
377 
378 
379 
380 
381 
382 
383 
384 
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check h<20> 
VDD plane 
data _h<5> 
data_h<66> 
data_h<0> 
iAdr h<6é> 
iAdr_ h<10> 
vRef 
sysClkOut2 h 
sysClkOut2 1 
spare<6> 
sysClkOutl h 
sysClkOutl 1 
cont_l 

irq h<5> 
spare<8> 

adr h<31> 
adr_h<27> 
adr h<24> 
adr h<17> 
adr _h<15> 
adr h<1l> 
adr_h<7> 

adr h<6é> 


data_h<70> 
data h<69> 
data_h<67> 
data_h<2> 
data _ h<64> 
iAdr h<7> 
iAdr h<12> 
reset 1 
sRomD_h 
sRomOE 1 
cpuClkOut h 
dcOk_ h 
tCristate 1 
icMode h<0> 
irq h<4> 
perf cnt_h<0> 
adr h<32> 
adr h<28> 
adr h<25> 
adr h<21> 
adr h<18> 
adr h<14> 
VSS plane 
adr h<9> 


Pinout 8-9 


AC1l 
AC2 
AC3 
Ac4 
AC5 
AC6 
AC7 
AC8 
Ac9 
AC1O 
AC11 
AC12 
Ac13 
AC14 
AC15 
AC16 
AC17 
~ AC18 
AC19 
AC20 
AC21 
AC22 
AC23 
AC24 


AD2 
AD3 
AD4 
ADS 
AD6 
AD7 
AD8 
AD9 
AD10 
AD11 
AD12 
AD13 
AD14 
AD15 
AD16 
AD17 
AD18 
AD19 
AD20 
AD21 
AD22 
AD23 
AD24 


101 
001 
006 
114 
007 
123 
012 
128 
013 
150 
018 
158 
019 
177 
024 
184 
025 
195 
030 
206 
031 
216 
038 
228 


107 


109 
115 
120 
124 
LS 
HERS Fo! 
130 
134 
151 
156 
L73 
176 
178 
183 
189 
194 
200 
203 
208 
209 
221 
224 


8-10 Pinout 


385 
386 
387 
388 
389 
390 
391 
392 
393 
394 
395 
396 
397 
398 
399 
400 
401 


402 


403 
404 
405 
406 
407 
408 


409 
410 
411 
412 
413 
414 


415 — 


416 
417 
418 
419 
420 
421 
422 
423 
424 
425 
426 
427 
428 
429 
430 
431 
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data_h<6> 
VSS plane 
VDD plane 
data_h<65> 
VSS plane 
iAdr h<8> 
VDD plane 
iAdr_ h<11> 
VSS plane 
sRomClk_h 
VDD plane 
spare<5> 
VSS plane 
irq h<2> 
VDD plane 
perf cnt_h<1l> 
VSS plane 
adr h<29> 
VDD plane 
adr h<22> 
VSS plane 


adr_h<16> 


VDD plane 
adr _h<10> 


data_h<4> 
data_h<3> 
data_h<1> 
iAdr_ h<5> 
iAdr_ h<9> 
spare<1> 
eclOut_h 
dInvReq_ h 
spare<2> 
spare<4> 
icMode_ h<1> 
irq_h<0> 
irg_h<1> 
irq h<3> 
spare<7> 
adr_ h<33> 
adr_ h<30> 
adr_ h<26> 
adr h<23> 
adr h<20> 
adr h<19> . 
adr h<13> 
adr_h<12> 


8.4 EV4/NVAX+ Pinout Differences 


The following table shows the differences between the EV4 chip pinout and the NVAX+ chip pinout. 
The NVAX+ pins pp_data_h<7:6> and pp_data_h<4:3> are normally tristated. NVAX+ will only 
drive them during chip test. 


PGA PAD PIN EV4 NVAX+ 

LOC. No. No. TYPE NAME TYPE NAME 

E22 324 118 I dWSel_h<0> I pp _cmd_h<0> 64b rode 
E23 323° 119 I dWSel_ h<1> I pp_cmd_h<l> 
E21 329 117 I  dRAck_h<l> I pp_emd_h<2> | oe cache 
L24 288 204 QO dMapWE_h O pMapWE_ h<0> 
AD9 130 416 i dInvReq_h I pInvReq_h<0> 
M24 282 216 N spare<0> 0 pMapWE_h<i> 
AD7 131 414 N spare<1> I clk rst_h 
AD10 134 417 N spare<2> 0 pp_data_h<0> 
C24 331, O72 N spare<3> I piInvReq_h<1> 
ADil 151 418 N spare<4> e) pp_data_h<2> 
AC12 158 396 WN spare<5> I oscl6M H 
AA11 157 347 N spare<6> O pp _data_h<l> 
AD16 183 423 N spare<7> O pp_data_h<5> 
AA16 188 352 oN spare<8> O pp _data_h<11> 
AB16 185 376 I perf cnt _h<0> 0 pp data _h<3> 
AC16 184 400 I perf_cnt_h<1> 0 pp _data_h<4> 
AD8 139: “445 i eclOut_h I test_mode_h 
R23 265 251 I tagadr h<17> B tagadr_ h<1l7> 
R22 264 250 ag tagadr h<18> B tagadr_h<18> 
R21 263 249 I tagadr h<19> B tagadr_h<19> 
X22 244 310 I tagadr h<32> 0 pp_data_h<6> 
Y24 241 336 I tagadr h<33> O £pp_data_h<7> 
¥22 237 334 ™B- adr_h<5> O adr _h<5> 
AA24 236 360 B adr h<6> O adr h<6é> 
AA23 233 359 B  adr_h<7> O adr h<7> 

¥24 232 333 3B adr_h<8> O adr h<8> 
AB24 229 384 B adr_h<9> O adr _h<9> 
AC24 228 408 B- adr h<10> O adr_h<10> 
AA22 225 358 B- adr _h<1l> O adr _h<il> 
AD24 224 431 3B  adr_h<12> O adr h<i2> 
AD23 221 430 B adr h<13> O adr h<13> 
AB22 220. 382 3B - adr h<14> O adr h<14> 
AA21 217 357 B adr h<15> O adr_h<15> 
AC22 216 406 B- adr h<16> O adr_h<16> 


Pinout 8-11 


8-12 Pinout 





Appendix A 
EV3 and EV4 Chip Summary 


The following two pages are a quick summary of the EV3 and EV4 chip. 
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Table A-1: EV3 Chip Summary and Micro-architecture 


Feature 

Cycle Time 

Process Technology 
Transistor count 

Die Size 

Package 

No. Chip Pads 

No. Signal Pins © 
Typ Power Dissipation 
Clocking input 
Virtual address size 
Physical address size 
Page size 

Issue rate 

Pipeline 

On-chip Deache 
On-chip Icache 
On-chip DTB 


On-chip [ITB 

FPU 

Bus 

Serial ROM Interface 


Description 

10 ns 

CMOS3 1.0u CMOS 

550K 

14.1mm X 16.8mm; 559mils X 657mils 
431 pin PPGA; 24 X 24, 100 mil pin pitch 
428 


— 291 


10W @ 10ns cycles, Vdd=3.45V 

200Mhz differential @ 10ns cycles 

43 bits 

34 bits 

8Kbytes 

2 instructions per cycle 

7 stage :fetch, swap, I0, [1, Al, A2, and write 

1Kbyte, physical, direct-mapped, write-thru, 32-byte line, 32-byte fill 
1Kbyte, physical, direct-mapped, 32-byte line, 32-byte fill, No ASN 
32-entry, fully-associative, NLU replacement, 8K pages, 1-bit ASM 
4-entry, fully-associative, NLU replacement, 512 * 8K pages, 1-bit ASM 
8-entry, fully-associative, NLU replacement, 1-bit ASM 

No on-chip FPU 

Separate data and address bus. 128-bit/64-bit Data Bus 

Allows the chip to access a serial ROM 


A-2 EV3 and EV4 Chip Summary 





Table A-2: EV4 Chip Summary and Micro-architecture 


Feature 

Cycle Time 

Process Technology 
Transistor count 

Die Size 

Package 

No. Chip Pads 

No. Signal Pins 

Typ Power Dissipation 
Typ Power Dissipation 
Clocking input 
Clocking input 
Virtual address size 
Physical address size 
Page size 

Issue rate 

Integer Pipeline 
Floating Pipeline 
On-chip Deache 
On-chip Icache 
On-chip DTB 


On-chip ITB 

FPU 

Bus 

Serial ROM Interface 


Description 

6.6ns nominal; 5ns fast bin; 10ns slow bin 

CMOS4 .75u CMOS 

1.8 million 

14.1 mm X 16.8mm; 555mils X 661mils 

431 pin PGA; 24 X 24, 100 mil pin pitch 

428 

291 

23W @ 6.6ns cycles, Vdd=3.45V 

29.5W @ 5ns cycles, Vdd=3.45V 

300Mhz differential @ 6.6ns cycles 

400Mhz differential @ 5ns cycles 

43 bits 

34 bits 

8Kbytes 

2 instructions per cycle 

7 stage :fetch, swap, I0, 11, Al, A2, and IWR 

10 stage :fetch, swap, I0, I1, F1, F2, F3, F4, F5 and FWR 

8Kbyte, physical, direct-mapped, write-thru, 32-byte line, 32-byte fill 
8Kbyte, physical, direct-mapped, 32-byte line, 32-byte fill, 64 ASNs 
32-entry, fully-associative, NLU replacement, 8K pages, 1-bit ASM 
4-entry, fully-associative, NLU replacement, 512 * 8K pages, 1-bit ASM 
8-entry, fully-associative, NLU replacement, 1-bit ASM 

On-chip FPU supports both IEEE and DEC floating point 

Separate data and address bus. 128-bit/64-bit Data Bus 

Allows the chip to access a serial ROM 
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From: SEGAD2::MCLELLAN "Ed McLellan DIN 225-4790 HLO2-3/J03 17-Oct-1991 
1507" 17-OCT-1991 15:09:16.67 

To; @OCT91 SORTLOC.DIS 

CC: MCLELLAN 

Subj: Updates to EV3/4 Specification V2.0 


Digital Equipment Corporation CONFIDENTIAL 


| 
ldlitgfslilslttajsjltlII=mINntTEROFFICE MEMORANDUM 


TO: Distribution DATE: 17-Oct-91 
FROM: Ed McLellan 
DEPT: SEG/AD 
EXT : 225-4790 
L/MS: HLO2-3/303 
ENET: AD: :MCLELLAN 


SUBJ: EV3/EV4 Specification 2.0 Updates 


EV4 Pass 2 design is well under way, however, some of the design 
modifications will alter functionality as described in the EV3/EV4 
Specification Version 2.0. This memo highlights those areas of change 
in order to most quickly disseminate the information. Since the 
design is not fully complete, additional modification, although 
unlikely, may be necessary. 


1 PAGE 2-2 SECTION 2.3.2 ITB 


CAUSE: increase maximum ITB translatable address space 
=> modify the first paragraph to begin with "The EV3 Ibox contains..." 
=> add the following text after the first paragraph 


The EV4 Ibox contains an 8 entry fully associative translation 
buffer which caches recently used instruction-stream page table 
entries for 8Kbyte pages, anda 4 entry fully associative translation 
buffer which supports the largest granularity hint option (512*8Kbyte 
pages) as described in Section 6.5 of the ALPHA SRM V4.0. Both 
translation buffers use a not-last-used replacement algorithm. They 
are hereafter referred to as the small-page and large-page ITBs, 
respectively. 


In addition, EV4 provides an extension referred to as the super 
page, which can be enabled by the MAP bit in the ICCSR IPR. Super 
page mappings provide one-to-one virtual PC<33:13> to physical 
PC<33:13> translation when virtual address bits <42:41> = 2. This 
function essentially maps the entire physical address space multiple 
times over to one quadrant of the virtual address space defined by 
<42;41> = 2. . When translating through the super page, the PTE[ASM] 
bit used in the Icache is always set. Access to the super page 
Mapping is only allowed while executing in kernel mode. 


Digital Equipment Corporation CONFIDENTIAL Page 2 


The ITBs are filled and maintained by PALcode. The operating 
system via PALcode is responsible for insuring that virtual addresses 
can only be mapped through a single, large page, small page or super 
page ITB entry at the same time. 


2 PAGE 2-4 SECTION 2.5.1 DTB 


CAUSE: increase maximum DTB translatable address space 
=> modify the first paragraph to begin with “EV3 contains a..." 
=> add the following text after the first paragraph 


EV4 contains a 32-entry fully associative translation buffer 
which caches recently used data-stream page table entries and supports 
all four variants of the granularity hint option as described in 
Section 6.5 of the ALPHA SRM V4.0. The operating system via PALcode 
1s responsible for insuring that translation buffer entries, including 
super page regions, do not map overlapping virtual address regions at 
the same time. 


In addition, EV4 provides an extension referred to as the super 
page, which can be enabled via ABOX CTL<5:4>. Super page mappings 
provide virtual to physical address translation for two regions of the 
virtual address space. The first enables super page mapping when 
virtual address bits <42:41> = 2. In this mode, the entire physical 
address space is mapped multiple times over to one quadrant of the 
virtual address space defined by VA<42:41> = 2. The second super page 
mode maps a 30-bit region of the total physical address space defined 
by PA<33:30> = 0 into a single corresponding region of virtual space 
defined by VA<42:30> = 1FFE(Hex). Super page translation is only 
allowed in kernel mode. 


3 PAGE 3-8 SECTION 3.4 PAL ENTRY POINTS 


CAUSE: provide larger CALLPAL code regions and PC+4 return addresses 
=> replace Table 3-5 CALLPAL entry with the following 


CALLPAL EV3 pipe stage[5] 2000,20,40,thru 256 locations based 
3FE0 on instruction[7:0] 
see description below 


=> add new Table 3-5 entry 
CALLPAL EV4 pipe stage[5] 2000, 40, 80,C0 128 locations based 


thru 3FC0 on instruction[7,5:0] 
see description below 


Digital Equipment Corporation CONFIDENTIAL Page 3 
=> add text describing CALL PAL instruction hardware dispatches 


PALcode functions are implemented via the CALL PAL instruction. 
CALL PAL instructions cause exceptions in the hardware. As with all 
exceptions, the EXC_ADDR register is loaded by hardware with a 
possible return address. EV3 always loads this register with the 
address of the instruction which caused the exception, or was 
executing, but not complete, at the time of the exception or trap. 
EV4 provides an improvement for CALL PAL exceptions. CALL PAL 
exceptions do not load the EXC ADDR register with the address of the 
CALL PAL instruction. Rather, they load the EXC ADDR register with 
the address of the instruction following the CALL PAL. For this 
reason, EV4 PALcode supporting the desired PAL mode function need not 
increment the EXC_ADDR register before executing a HW REI instruction 
to return to native mode. This feature requires special handling in 
the arithmetic trap and machine check PALcode flows for EV4. See 
Section 3.8.5 EXC_ADDR for more complete information. 


To improve speed of execution, a limited number of CALL PAL 
instructions are directly supported in hardware with dispatches to 
specific address offsets. EV3 provides the first 128 privileged and 
128 unprivileged CALL PAL instructions with hardware PAL entry points 
starting at address offset 2000 (Hex) and continuing through 3FE0O (Hex). 
Addresses are generated in the following manner. Note that <7> 
distinguishes privileged instruction encodings. 


Offset (Hex) = 2000 + (Instruction<7:0> shift left 5) 


EV3 CALL PAL instructions that do not fall within the range 
[00000000: 000000FF] result in OPCDEC exceptions. In addition, 
CALL PAL instructions that fall within the range [00000000:0000007F] 
while EV3 is not executing in kernel mode result in OPCDEC exceptions. 


EV4 provides the first 64 privileged and 64 unprivileged CALL PAL 
instructions with larger code regions than EV3; 64byte vs. 32byte. 
This produces hardware PAL entry points as described below. 

Privileged CALL PAL Instructions [00000000:0000003F] 
Offset (Hex) = 2000 + (<5:0> shift left 6) 
Unprivileged CALL PAL Instructions [00000080 :000000BF] 


Offset (Hex) = 3000 + (<5:0> shift left 6) 


EV4 CALL PAL instructions that do not fall within the ranges 
[00000000:0000003F] and [00000080:000000BF] result in an OPCDEC 
exception. In addition, CALL PAL instructions that fall within the 
range [00000000:0000003F] while EV4 is not executing in kernel mode 
will result in an OPCDEC exception. 
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oO 


PAGE 3-10 SECTION 3.5.1 EVX PAL RESTRICTIONS 


CAUSE: respecify PALcode restriction regarding PAL TEMP use 


replace first bullet item with the following 


HW MFPR instructions reading any PAL TEMP register can never occur 
exactly two cycles after a HW_MTPR instruction writing any PAL TEMP 
register. The simple solution results in code of the form: 


HW MTPR Rx, PAL RO ; Write PAL temp 0 
HW MFPR R31, 0 ; NOP mxpr instruction 
HW MFPR R31, 0 ; NOP mxpr instruction 
HW MFPR Ry, PAL RO ; Read PAL temp 0 


The above code guarantees 3 cycles of delay after the write before 
the read. It is also possible to make use of the cycle immediately 
Following a HW MTPR to execute a HW MFPR instruction to the same 
(accomplishing ~ a swap) or a different PAL TEMP register. The swap 
operation only occurs if the HW _MFPR instruction immediately follows 
the HW MTPR. This timing requires great care and knowledge of the 
pipeline to insure that the second instruction does not stall for 
one or more cycles. Use of the slot to accomplish a read from a 
different PAL TEMP register requires that the second instruction not 
stall for exactly one cycle. This is much easier to insure. A 

HW MFPR instruction may stall for a single cycle as a result of a 
write after write conflict. 


add new PAL Restriction 17. 


17. PALcode that writes multiple ITB entries must write the entry 
that maps the address contained in the EXC_ADDR register last. 


5 PAGE 3-11 SECTION 3.5.1 EVX PAL RESTRICTIONS. 


CAUSE: Spec correction 
=>Change restriction nine as follows: 


The sequence HW MTPR PTE, HW MTPR TAG is NOT allowed. At least 


two null cycles must occur between HW MTPR PTE and HW _MTPR TAG. 


6 PAGE 3-13 SECTION 3.5.3 EV4 SPECIFIC PALMODE RESTRICTIONS 


CAUSE: add new EV4 restrictions 
=> add new PAL Restrictions 3 thru 5. 


3. At least one cycle of delay must occur after a HW MTPR TB CTL 
before either a HW MTPR ITB PTE or a HW _MFPR ITB | PTE to allow 
setup of the ITB large page/small page decode. 


4. The first cycle (the first one or two instructions) at all 
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PALcode entry points may not execute a conditional branch 
instruction or any other instruction that uses the jsr stack 
This includes instructions JSR, JMP, RET, COROUTINE, 
BSR, HW REI and all Bxx opcodes except BR, which is allowed. 


hardware. 


The following table indicates the number of cycles required 
after a HW_MTPR instruction before a subsequent HW_REI 
instruction for the specified IPRs. These cycles can be 
insured by inserting one HW MFPR R31,0 instruction or other 
appropriate instruction(s) for each cycle of delay required 


after the 


IPR 
xTBIS, ASM, 
xLER 
xIRR 
ICCSR<FPE> 
ICCSR<ASN> 


HW MTPR, 


Cycles between HW MTPR and HW REI 


ZAP 


FLUSH_IC[ASM] 


awe ae ews eee eee owe ee ee eee ee ie ae eee ee ee ee ee ee 


7 PAGE 3-12 SECTION 3.5.2 EV3 SPECIFIC PALMODE RESTRICTIONS 


CAUSE: 


spec correction 
=> replace Table 3-7 with the following 


MTPR-Write 
ITB PTE 
ICCSR 
EXCSUM 
PS 

HIER 
SIER 
ASTER 
SLCLR 
SIRR 
ASTRR 


MF PR-Read 
ITB PTE TEMP 
ICCSR — 
EXCSUM 

PS 

xIER 

xIER 

xILER 

xIRR 

xIRR 

xIRR 


8 PAGE 3-14 TABLE 3-8 IPR RESET STATE 


CAUSE: 


spec corr 


ection 


=> replace first entry with the following 


ICCSR 


cleared except 
ASN, PCO, PC1 
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9 PAGE 3-19 SECTION 3.8.3 ICCSR 


CAUSE: spec correction, add super page enable 
=> add text 


EV3 can be distinguished from EV4 by writing a one to ICCSR<1> and 
reading back ICCSR<3>. EV3 returns ICCSR<3> equal to one and EVv4 
reads back zero. 


=> modify Write format diagram 
bit 41 - MAP 

=> modify Read format diagram 
bit 22 - MAP 

=> add description to HWE Field 


Use of the HW MTPR instruction to update the EXC_ADDR IPR while in 
native mode is restricted to values with bit<0> equal to 0. The 
combination of native mode and EXC ADDR<0> equal to one causes 
UNDEFINED behavior. ~ 


=> add Field to Table 3-9 ICCSR (EV4 only) 


MAP RW,O If set allows super page I-stream memory mapping of 
VPC<33:13> directly to Physical PC<33:13> essentially 
bypassing ITB for VPC addresses containing VPC<42:41>= 2. 
Super page mapping is allowed in Kernel mode only. The 
ASM bit is always set. The MAP bit is available in EV4 only. 


10 PAGE 3-22 SECTION 3.8.5 EXC_ADDR 


CAUSE: add support for CALL PAL automatic load of PC+4 return address 
=> replace the first two paragraphs with the following 


The EXC ADDR register is a read/write register used to restart 
the machine after exceptions or interrupts. The EXC ADDR register can 
be read and written by software via the HW MTPR instruction as well as 
being written directly by hardware. The HW REI instruction executes a 
jump to the address contained in the EXC. ADDR register. The EXC ADDR 
register is written by hardware after an exception to provide a return 
address for PALcode. The instruction pointed to by the EXC ADDR 


register did not complete execution. Since the PC is longword 
aligned, the lsb of the EXC_ADDR register is used to indicate PALmode 
to the hardware. When the Il1sb is clear, the HW REI instruction 


executes a jump to native(non-PAL) mode, enabling address translation. 


In EV3, the address written to the EXC ADDR register after an 
exception is always the PC of the instruction that caused the 
exception or the PC of the instruction that was currently executing, 


a 
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but not complete, at the time of the exception or trap. As a special 
case in EV4 only, CALL PAL exceptions load the EXC ADDR with the PC of 
the instruction following the CALL PAL. This function allows CALL PAL 
service routines to return without needing to increment the value in 
the EXC_ADDR register. 


This feature, however, requires careful treatment in PALcode. 
Arithmetic traps and machine check exceptions can preempt CALL PAL 
exceptions resulting in an incorrect value being saved in the EXC_ADDR 


register. In the cases of an arithmetic trap or machine check 
exception, and only in those cases, EXC _ADDR<1> takes on special 
meaning. PALcode servicing these two exceptions should interpret a 


zero in EXC_ADDR<1> as indicating that the PC in EXC_ADDR<63:2> is too 
large by a value of 4bytes and subtract 4 before executing a HW REI 
from this address. PALcode should interpret a one in EXC ADDR<1> as 
indicating that the PC in EXC ADDR<63:2> is correct and clear the 
value of EXC ADDR<1>. All other PALcode entry points except reset can 


expect EXC . ADDR<1> to be zero. 


This logic allows the following code sequence to conditionally 
subtract 4 from the address in the EXC ADDR register without use of an 
additional register. This code sequence should be present in 
arithmetic trap and machine check flows only. 


HW MFPR Rx, EXC ADDR ; read EXC ADDR into GPR 

SUBQ Rx, #2,Rx ; subtract 2 causing borrow if <1>=0 
BIC Rx, #2, Rx ; clear <1> 

HW MTPR RX, EXC ADDR ; write back to EXC ADDR 


EV3 does not provide an advanced EXC_ADDR value after CALL PAL 
exceptions. It also does not guarantee a zero value in EXC ADDR<1> 
upon reads of that IPR. PALcode must explicitly clear this bit before 
pushing the exception address on the stack. 


11 PAGE 3-31 SECTION 3.9.1 DTB CTL 


CAUSE: register now controls both ITB and DTB granularity hints 
=> Change header title to TB CTL 
=> Replace text with the following 


The granularity hint (GH) field selects between the EVx TB page 
mapping sizes. EV3 provides two sizes in the DTB, selectable through 
this register and only the smallest size (8Kbytes) in the ITB. EV4 
provides two sizes in the ITB and all four sizes in the DTB. When 
only two sizes are provided, the large-page-select (GH=1l(bin)) field 
selects the largest mapping size (512 * 8Kbytes) and all other values 
select the smallest (8Kbyte) size. The GH field affects both reads 
and writes to the ITB and DTB in EV4. It only affects use of the DTB 
in EV3. The TB CTL register itself is write only. 
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12 PAGE 4-22 SECTION 4.4.5 WRITE BLOCK 


CAUSE: spec correction 
=> Change the description for cycle one as follows: 


1. The WRITE BLOCK cycle begins. EVx has already placed the 
address of the block on adr h. EVx places the longword valid 
masks on cWMask_h and a WRITE BLOCK command code on cReq_h. 
EVx will clear dataA_h[4..3] and tagCEOE h no later than one 
CPU cycle after the system clock edge at which the transaction 
begins. EVx clears dataCEOE H[3..0] at least one CPU cycle 
before the system clock edge at which the transaction begins. 


13 PAGE 3-34 SECTION 3.9.11 ABOX CTL 


CAUSE: add big endian, super page options 
=> add new bit<6> big endian enable (EV4 only) 


EV4 provides limited hardware support for big endian data formats via 
bit <6> of the ABOX CTL register. This bit, when set, inverts 
physical address bit <2> for all D-stream references. It is intended 
that chip endian mode be selected during initialization PALcode only. 
=> add new bit<5> VA<42:41> super page enable (EV4 only) 

This bit, when set, enables one-to-one super page mapping of D-stream 
virtual addresses with VA<33:13> directly to physical addresses 
PA<33:13>, 1f virtual address bits VA<42:41> = 2. Virtual address 
bits VA<40:34> are ignored in this translation. Access is only 
allowed in kernel mode. 

=> add new bit<4> VA<42:30> super page enable (EV4 only) 

This bit, when set, enables one-to-one super page mapping of D-stream 


virtual addresses with VA<42:30> = 1IFFE(Hex) to physical addresses 
with PA<33:30> = 0(Hex). Access is only allowed in kernel mode. 


14 PAGE 4-14 SECTION 4.2.7 EXTERNAL CYCLE CONTROL 
CAUSE: provide address [2] for I/O device longword read granularity 
SPECIAL NOTE: Consideration for final inclusion is still pending. 


=> replace the first paragraph with the following 


On READ BLOCK and LDxL cycles, the cWMask h pins have additional 


information about the cache miss overloaded onto them. The 
cWMask h[{1:0] and cWMask_h[3] pins contain miss address bits [4:3] and 
[2] respectively. These additional address bits, which specify the 


longword that missed, are needed to implement longword granularity to 
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I/O devices. 


15 PAGE 5-2 SECTION 5.1.4 SIGNAL PINS 


CAUSE: Correct clock input leakage spec 
=> replace first line of Power/Leakage section of table 5-1 


Icin Clock input Leakage 4 4 mA -0.5 < Vin < 3.6V 


16 PAGE 7-2 CHAPTER 7 PACKAGE 


CAUSE: Correct PGA location grid to conform to JEDEC standard, that is 
required by the ceramic PGA vendors. 


SPECIAL NOTE: This change in no way affects functionality. It simply 
updates pin labels to conform to JEDEC standards. All 
future references to PGA locations should use these labels. 


=> replace row label X with new label W 
=> replace row label W with new label V 


17 PAGE 8-2 SECTION 8.2 CHANGE HISTORY 
=> add line at bottom of table 
ejm 17-oct-91 modify labels to conform to JEDEC standard 


replace row label X with new label W 
replace row label W with new label V 


18 PAGES 8-7,8-8 SECTION 8.3 EV4 PINOUT 


=> replace PGA loc Wl - W24 with corresponding V1 - V24 labels 
=> replace PGA loc X1 - X24 with corresponding W1 - W24 labels 


19 PAGE 8-5 SECTION 8.3 EV4 PINOUT 
=> correct PAD no. at PGA loc H19 from N/A to 133 


H19 133 163 P VSS Plane 
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20 PAGE A-3 


CAUSE: update table for pass 2 functionality 
=> replace corresponding lines with the following 


Cycle Time 


On-chip DTB 


On-chip ITB 


6.6ns nominal; 5ns fast bin; 8ns slow bin 


32-entry, fully-associative, NLU replacement, 
full granularity hint support, l1-bit ASM 


8-entry, fully-associative, NLU replacement, 
8K pages, 1-bit ASM 


4-entry, fully-associative, NLU replacement, 
512 * 8K pages, 1-bit ASM 





From: AD: :MEYER "Dirk Meyer HLO2-3/J3 225-6325 09-Feb-1993 1611" 9-FEB-1993 
16:18:36.78 

TO% @EV45 

CC 

Subj: EV45 definition, rev 1.1 


frdlitlgtitttatstiltitNtTEROFFICE MEMORANDUM 


TO: “Distribution DATE: 9-Feb-93 
FROM: Dirk Meyer 
DEPT: SEG/AD 
EXT : 225-6325 
L/MS: HLO2-3/J03 
ENET: AD: :MEYER 


SUBJ: EV45 Definition - Rev 1.1 


OVERVIEW 

This memo describes the electrical and functional differences 
between EV45 and EV4 pass 3. It replaces a previous memo dated 
2-Nov-1992, and contains change bars to highlight changes and 
additions to that document. The substantive changes to EV45 from 
those previously described are: 


1. EV45 will contain a mode bit which when set will have the effect 
of asserting dInvReq_h<1l> when dInvReq_h<0> is asserted. This 
will allow EV4-based systems which do not contain a DCache 
backmap to upgrade to EV45 operating in 16KB DCache mode with no 
module-level changes. Such systems were previously required to 
externally tie dInvReq_h<0> and dInvReq_h<1> together. 


2. EV45 will include a new operating mode which will enable LDx/L 
and STx/C instructions to be processed by EV45 using BCache-hit 
timing. This will result in better LDx/L and STx/C performance 
for systems which support this mode. 


3. The tagAdr_h<17> pin will be redefined to support the above 
operating mode, as a result EV45 will not support a 128 KB 
BCache. 


4. The sRomClk_h divisor when loading the module-level serial ROM 
will be changed from 126 in EV4 to 254 in EV45 in order to 
ensure that existing EV4 serial ROM designs will work with a 
higher frequency EV45. 
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1 ELECTRICAL CHARACTERISTICS 


We expect the CMOS-5 technology to provide about a 1.5X clock 
frequency improvement over CMOS-4. Therefore, EV4’s speed bin 
points of 150 and 180 MHz should move to about 225 and 270 MHz, 
respectively, for EV45. Our goal is to ensure reliable operation up 
to 300MHz so that we can take advantage of any additional 
performance which CMOS-5 may provide. At a given clock frequency 
EV45’s power dissipation will be about 80 percent of EV4’s. The 
typical and maximum supply currents for EV45 can be estimated from 
the following equations: 


Idd(Typ) = [ 116 mA/V + 9.56 mA/(V * MHz) * f£ 
= f 


* Vdd 
Idd (Max) [ 116 mA/V + 11.5 mA/(V * MHz) * 


] 
] Vdd 
where: 


Idd is the supply current in amperes 

f is the CPU frequency in MHz 

Vdd is the power supply voltage in volts 
power dissipation is Vdd * Idd 


1.1 AC Timings 


This section describes the AC timing specifications for the nominal 
speed EV45 part, which as described above should provide a 225 MHz 
internal operating frequency. These timings are specified and 
measured in exactly the same way as were the AC timings for EV4. 
Refer to the DECchip 21064 Microprocessor Hardware Reference Manual 
for details. 


1.2 BCache Read Loop 


The external flow through delay of the BCache read loop, as defined 
in section 7.3.5 of the DECchip 21064 Hardware Reference Manual, 
must not exceed the overall BCache read time (BC_RD_SPD+1 CPU 
cycles) less 4.0 ns. 


1.3 External Cycles 


= oe a eee ew eee ee ww oe oe ee ee ee ee em ee ee ee oe em ge cme oe cee cm oem oe oe ee we iw eee ee ee eee ee ee ee es ee ee ee eo eee ie ie oe eo ow aoe 


Enable, sysClkOuti_h to 


—_ we a ee om eee om ew ee om a ee ee oe ew ee ee wee eee ae ow ee ee ee ae ee ee a em oe = am se ae ee em ee ee aw ew a ew aew ce ae wr awe aw eee om a a em owt ew eee ew ee ee ew ew a == 


adr_h -1.0 2.0 
data_h -1.0 2.0 
check_h -1,0 2.0 


—_— ome awe cee ae ee ee ee ee ee ow ae ee eo ee ee eo ee es ee ee ee ee eee ee ee ee ee eee eee ee ee ee ee ee ee eee ee eee eee ee 


Output Delay, sysClkOutl1_h to 


adr_h -~1.0 1.0 
data_h ~1.0 1.0 
check_h -1.0 1.0 
cRegq_h -1.0 1.0 
cWMask_h -1.0 1.0 
holdAck_h -1.0 1.0 


a ae ee ow ee em ie we ee ee ee eee ee ee eee ee ee ee ee see ee ee ee ee ee ee ee ee ee ee eee eo eo ee ee eee ie ee ee 


—— «me ee ieee OP ee ee ee eee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee eee ie i 


dadinvRegq_h 
i1Adr_h 
perf_cnt_h 
data_h 
check_h 


— om ae ae a ewe ae ee eee ee gee ee ewe oe a gee ee eee eee gee ee ee ee ee ee eee ee eee ee ee ee ee ee eee ee eee ee ee ee ee ee eee ee ee ee ee ee ee oe 


holdReq_h 
dInvReq_h 
iAdr_h 
perf_cnt_h 
data_h 
check_h 


eiekoke koko soekoekoxze) 
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2 ICACHE INCREASED TO 16KB 


The instruction cache will be increased in size from 8 KB to 16 KB. 
It will be direct mapped, virtually addressed using VA<13>, and 
physically tagged. Since the ICache is never written and does not 
contain coherence hardware no extra logic will be required to manage 
virtual synonyms. 


3. DCACHE INCREASED TO 16KB 


The data cache will also be increased from 8 KB to 16 KB, remain 
physically tagged and direct mapped, and become virtually addressed 
using VA<13>. The DCache requires additional logic to manage 
virtual synonyms, however. In addition, the DCache coherence 
interface requires changes. As externally viewed the DCache appears 
to be two-way set associative, which implies that systems which 
employ a backmap to filter invalidates need more information to 
maintain the backmap. 


o For external read transactions, cWMask_h<3> and cWMask_h<4> will 
each carry virtual address bit <13>. This information is 
duplicated on these pins so that both slices of the Cobra bus 
interface ASICs will have access to it. NVAX+ places the set 
number on cWMask_h<3>. 


o EV45 will include a second backmap write enable output pin. It 
will assert one of dMapWe_h<1:0> during D-stream backup cache 
reads to indicate where the block is being allocated in the 


DCache. dMapWe_h<0> will assert if VA<13> generated by the 
originating load instruction was zero, while dMapWe_h<l> will 
assert if it was. one. The new output, dMapWe_h<1l> will be 
placed on spare<0> (PGA location M24) in order to match NVAX+. 

o EV45 will include a second invalidate request input. External 
logic may assert one or both of dInvReq_h<1:0> along with 
1Adr_h<12:5> to invalidate DCache lines. The new input, 


dInvReq_h<1>, will be placed on spare<3> (PGA location C24) in 
order to match NVAX+. 


Systems which do not include a DCache backmap can simply tie 
dInvReq_h<1:0> together. Alternatively, they can set ABOX_CTL<15>. 
This bit, when set, has the effect of asserting dInvRegq_h<l> when 
GdInvReq_h<0> is asserted, and is intended for use by existing 
EV4-based systems which do not use dInvReg_h<l>. 


In order to provide compatibility with existing system designs which 
use a DCache backmap, EV45 will include a mode in which the DCache 
reverts to 8 KB. This mode will be controlled via ABOX_CTL<12>. 
When ABOX_CTL<12> is clear the DCache will operate in 8 KB mode, and 
when it’s set the DCache will operate in 16 KB mode. 
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4 NEW FLOATING POINT DIVIDER 


EV45 will include new floating point divide hardware which 
implements a nonrestoring, normalizing, variable shift (maximum 
4-bits/cycle) algorithm that retires 2.4 bits per cycle on average. 
The average overall divide latency including pipeline overhead will 
be 29 cycles for double precision and 19 cycles for single precision 
vs. 63 and 34 cycles in EV4. 


The new divider will also address EV4's noncompliant IEEE divide 
behavior by calculating the inexact flag, setting FPCR<INE> if 
appropriate, and trapping on DIVx/SI instructions only when the 
result is really inexact. 


5 IMPROVED BRANCH PREDICTION 


EV45 will include an improved branch prediction scheme which uses a 
4K by 2-bit history table. The table will be indexed using the same 
bits used to index the ICache. Each 2-bit table entry behaves as a 
counter which increments on taken branches (stopping at binary 11) 
and decrements on not-taken branches (stopping at binary 00). If 
the upper bit of the counter is set, the branch is predicted taken. 
The contents of the table will not be disturbed by ICache fills. As 
in EV4, EV45 will also include a static branch prediction mode which 
uses the sign bit of the branch displacement. 


6 PARITY FOR ICACHE AND DCACHE. 


The ICache and DCache will include parity protection. Each cache 
line will contain a tag parity bit and eight longword data parity 
bits. The ICache tag parity bit will be calculated across the ASN 
and ASM bits in addition to the tag address. 


DCache and ICache parity errors will generate a machine check, if so 
enabled by ABOX_CTL<MCHK_EN>, and will set C_STAT (formerly DC_STAT) 
register bits <4> and <5>, respectively. These bits will be cleared 
when the C_STAT register is read. ICache parity errors will be 
recoverable - the PAL machine check handler can flush the ICache and 
return. DCache parity errors will not be recoverable. 


Primary cache parity checking can be disabled by setting ABOX_CTL 
bit <14>. , 


In order to ease an on-chip circuit path, EV45 will derive primary 
cache fill data parity from the parity or ECC check bits received 
from off-chip. In longword parity mode, the externally supplied 
parity bit will be used. In byte parity mode (see below) the four 
externally supplied byte parity bits will be XOR’d to generate 
longword parity. In ECC mode the externally supplied check bits 
will be XOR‘d to generate longword parity, and single-bit errors in 
the check bits will be corrected and reused to calculate longword 
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parity. External read transactions for which no parity or ECC bits 
are supplied with the response data represent a problem under this 
scheme. The solution will be to add a bit in each cache tag which 
when set overrides parity checking across the associated cache data. 
This bit will be set for fills from external reads which aren’t 
accompanied by parity or check bits. 


Since internal cache data parity is derived from the parity which 
accompanies the cache fill data, and since EV45 has a mechanism for 
creating bad parity in the backup cache, no explicit diagnostic 
hooks are required for the primary cache data parity function. 
Diagnostic code for systems which employ ECC need to briefly operate 
the chip in parity mode in order to use this diagnostic method. 
Internal cache tag parity diagnostics can be written using bit <13> 
of the ABOX_CTL register. This bit, when set, will generate 

incorrect tag parity for both ICache and DCache fills. 3 


7 BYTE PARITY ON EXTERNAL DATA BUS 


EV45 will include a mode for byte parity on the external data bus in 
order to allow it to more easily interface with industry standard 
peripherals which support byte parity. This mode will be controlled 
by BIU_CTL<37>: : 


BIU_CTL<37> BIU_CTL<1> 


| 
| 
(BYTE_PARITY) (ECC) | mode 
eet em te mem ee eet ee me nee eee ee ee ee i ee ee es ee fw wren nm ww em wm wen mew wo 
0 0 | LW parity 
x 1 | ECC 
1 0 | byte parity 


BYTE_PARITY and ECC are cleared by chip reset. 


In byte parity mode the check_h pins carry EVEN parity across’ the 
associated data_h pins. The correspondence between data_h and 
check_h pins is shown below: 
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ee ie ee ee eee ee es ee i eo es es ee ee ee ee eee eee eee i 


data_h check_h 
data_h<7:0> check_h<0> 
data_h<15:8> check_h<1> 
data_h<23:16> check_h<2> 
data_h<31:24> check_h<3> 
data_h<39:32> check_h<7> 
data_h<47:40> check_h<8> 
data_h<55:48> check_h<9> 


data_h<71:64> check_h<14> 
data_h<79:72> check_h<15> 
data_h<87: 80> check_h<16> 
data_h<95:88> check_h<17> 
data_h<103:96> check_h<21> 
data_h<111:104> check _h<22> 
data_h<119:112> check_h<23> 


| 

| 

| 

| 

| 

! 
data_h<63:56> ! check_h<i0> 

| 

| 

| 

| 

| 

| 
data_h<127:120> | check_h<24> 


8 INTERNAL SYNCHRONIZERS FOR TAGOK_H, TAGOK_L 


The tagOk function as currently defined does not scale well at 
higher clock rates and will be almost impossible to use for EV45 
systems running below 4ns. In order to alleviate this inherent 
timing constraint the synchronizers currently implemented off-chip 
on the Laser and Cobra modules will be implemented on-chip in EV45. 
Three cycles of worst-case synchronizer delay will be added to the 
current internal tagOk path. The tagOk_h and tagOk_1 inputs will 
become single ended inputs referenced to VREF. Either input can be 
used to control the tagOk function. Systems which use tagOk_h 
should tie tagOk_1 to VSS, while systems which use tagOk_1 should 
tie tagOK_h to VDD. Systems which don’t use the tagOk function 
should tie tagOk_h to VDD and tagOk_1l to VSS 


9 NEW MODE FOR DMAPWE_H PINS 


At the Laser team’s request, EV45 will include a mode in which the 
dMapWe_h pins assert during both I-stream and D-stream backup cache 
reads. This makes it possible to build external hardware to track 
the frequency with which given BCache blocks are accessed, and to 
base BCache allocation on this information to improve overall 
performance. This mode will be controlled via BIU_CTL<39>, IMAP_EN. 
When IMAP_EN is set, dMapWe_h<1:0> will assert during both I-stream 


and D-stream backup cache _ reads. For D-stream reads one of 
dMapWe_h<1:0> will assert based on which half of the DCache is being 
allocated. Which dMapWe_h pin asserts for I-stream reads is 


UNPREDICTABLE. 


SS i i i i ce i i i A i 
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10 TAGEQ_L FUNCTION REMOVED 


The tagEq_l1 function was originally proposed by the Flamingo design 
team but is not used in the Flamingo design, and as currently 
defined is not useful for any existing design. This function will 
therefore be removed from EV45. 


11 NEW MODE FOR LDX/L AND STX/C HANDLING. 


EV45 will contain a new operating mode, fast lock, which will 
improve the performance of LDx/L and STx/C instructions in systems 
designed to support this mode. Fast lock mode is enabled by setting 
BIU_CTL<44>, and can only be used with OE-mode BCache RAMs, i.e., 
when BIU_CTL<2> is set. 


When operating in fast lock mode, EV45 attempts to service LDx/L and 
STx/C instructions using BCache-hit timing. Two new pin functions 
will also be used to support this mode: lockWe_h and lockFlag_h. 
LockWe_h will use the pin previously used by tagEq_l, and lockFlag_h 
will use the pin previously used by tagAdr_h<17>. 


For LDx/L, EV45 performs a 32-byte BCache read if the address hits a 
valid BCache block. In addition, it will assert lockWe_h with the 


same timing as it asserts dMapWe_h. It is intended that module 
level hardware use the assertion of lockWe_h and dataCEOE_h to set 
the lock flag and load the lock address register. Further, it is 


assumed that the module design uses the tagOk mechanism (or some 
other module-level means) to ensure that this operation does not 
conflict with module-level access to the lock hardware. If the 
probe does not hit a valid BCache block, EV45 will start a 
sysClkOut-timed LDx/L transaction on cReg_h<2:0>. 


For STx/C, EV45 will probe the BCache and sample lockFlag_h. If the 
probe hits a valid nonshared BCache block and lockFlag_h is 
asserted, EV45 will perform the BCache write and assert lockWE_h 
with the same timing as tagCtlWE_h. It is intended that module 
level hardware uses the assertion of tagWE_H and the deassertion of 
dataCEOE_h to clear the lock flag. If the probe doesn’t hit a valid 
nonshared BCache block, or if lockFlag_h is deasserted, then EV45 
will start a sysClkOut-timed STx/C transaction on cReq_h<2:0>. 
LockFlag_h has the same timing requirements as tagAdr_h<33:18>. 


12 TWEAK TO WRITE BUFFER UNLOAD LOGIC. 


The EV4 write buffer implementation does not fully comply with the 
Alpha SRM’s requirement that writes not be buffered indefinitely. 
In EV4, the write buffer attempts to send a buffered write off-chip 
when one of the following conditions is met: 
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1. The write buffer contains at least two valid entries. 


2. The write buffer contains one valid entry and 256 cycles have 
elapsed since the execution of the last write. 


3. The buffer contains an MB or STx/C instruction. 

4. A load miss hits an entry in the write buffer. 

Condition 2 above is implemented using an 8-bit counter, and the 
overflow of this counter is used to kick the buffer. The counter is 
cleared when one of the following conditions is met: 

1. The write buffer is empty. 

2. The write buffer unloads an entry. 

3. A write executes. 

Condition 3 above will be removed from the counter’s reset equation 
in EV45, since it permits a sadistic case to cause writes to be 


buffered indefinitely. This case would require an indefinite stream 
of writes which all merge into the same 32-byte buffer entry. 


13. SUPPORT FOR 3-CYCLE EXTERNAL CACHE READ 


EV4 has a design bug which prevents it from supporting 3 cycle 
external cache reads. This problem will be corrected in EV45. 


14 SYSCLKOUT CHANGES 


14.1 

EV45 will support sysClkOut divisor values between 2 and 17. The 
additional divisors will be encoded using a new input, sysClkDiv_h, 
which will be placed on spare<8> (PGA location AA16). Systems may 


tie sysClkDiv_h to VDD to have access to the additional clock 
ratios. As in EV4, the values placed on irq _h<2:0> during reset 
will also be used to select the sysClkOut_h ratio. The table below 
shows the ratio encodings. 
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sysClkDiv_.h irg h<2> #£=irg_h<l> irg_h<0> sysClk Divisor 
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The sysClkOut2_h delay options will be the same as in EV4 - zero, 
one, two or three CPU cycles. When dcOk_h is deasserted, a 
sysClkOut divisor of nine will be applied to the internally 
generated CPU clock. 7 


14.2 SyscClkOut Divider Initialization 


EV45 will include a pin to initialize the sysClkOut divider for chip 
testing. The new pin, resetSClk_h, will be placed on spare<6> (PGA 
location AAl11). Appendix A describes the timing of this 
tester-controlled sequence. 


15 CACHE REDUNDANCY 


Each cache in EV45 is physically implemented as two separate arrays. 
Each array contains 66 rows, two of which are redundant and may be 
used for laser repair. As shown if the diagram below, each array 
consists of two subarrays separated by a central row-pair decoder. 
Each subarray has an independent set of fuses for laser programming. 
Rows within each subarray are manipulated as adjacent pairs - within 
a subarray only a single defective adjacent pair may be replaced. 
The subarray fuses can be programmed independently. | 


e $e em en mer en ne ene toe ter $a ern ee nr er ene + 
| | | | | | 
| | | F | D , 
| ARRAY | U | E | U | ARRAY 
33 | | Ss | Cc | Ss | 
Row | | E | O | E | 
Pairs | | s | D | s | 
| | | | E | | 
| | | | | | 
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16 SERIAL ROM & ICACHE TESTING CHANGES 


Each ICache block in EV45 will contain 18 new bits (eight data 
parity bits, one tag parity bit, one bit to disable data parity 
checks, and eight additional branch history bits), or 311 bits 
altogether. Systems which use serial boot ROMs must supply values 
for each of these bits. Odd parity is used. Only half of the 
ICache can be utilized by the serial boot code. This consists of 
79,616 bits (256 blocks at 311 bits/block). The ICache blocks are 
loaded in sequential order starting with block zero and ending with 
block 255. The table below shows the bit shift order within each 
cache block. Bits are shifted from top to bottom and from left to 
right: 
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Name Required Value 


—. ie ee ee Ce eS 


bht<15:0> 


1lw7<31:0> 
1lw5<31:0> 


ap3 
lw3<31:0> 
lw1<31:0> 
dp1 

noDp 

V 

asm 
asn<5:0> 
tag<33:13> 
tp 

ap6 
lw6<31:0> 
lw4<31:0> 
ap4 

dp2 
lw2<31:0> 
lw0<31:0> 
apo 


mM KEM MM KROME EK KM MK MM MM 


The table also shows the values which must loaded into each cache 
block’s tag and control bits. 


The ICache will be implemented as two physically separate arrays. 
In ICache serial write mode the contents of both arrays will be 
written from the same serial input stream. In ICache serial read 
mode the contents of the two arrays will be shifted onto two pins - 
sRomOe_1l and sRomClk_h. Thus the overall test time for the 16 KB 
ICache in EV45 will be about the same as for the 8 KB ICache in EV4. 


In order to fully test EV45 at wafer probe without having to first 
laser repair defective ICache rows, EV45 will include a "soft fuse" 
mode in which defective ICache rows can be electrically replaced. A 
new Signal, icMode_h<2>, will be placed on spare<l> (PGA location 
AD7). The table below shows the encoding of icMode_h<2:0>. 
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icMode_h<2> icMode_h<1> icMode_h<0> Mode 


L L i Serial ROM Mode 
Soft Fuses Disabled 
L L H Serial Interface Disabled 
Soft Fuses Disabled 
L H L Icache Test - Write 
Soft Fuses Disabled 
L H | H TCache Test - Read 
' Soft Fuses Disabled 
H L L Load Soft Fuses 


H L H Serial Interface Disabled 
. Soft Fuses Enabled 


H H L ICache Test - Write 
Soft Fuses Enabled 


H H H ICache Test - Read 
Soft Fuses Enabled 


The sRomClk_h divisor in mode zero will change from 126 in EV4 to 
254 in EV45. Modes one through three are identical to their EV4 
counterparts. 


Mode four allows the soft fuses to be written from a serial bit 
stream applied to sRomD_h. Modes five through seven are the same as 
modes one through three, except that the soft fuses are enabled. 
The sequence for using the soft fuses is described below. 


1. Test the ICache using modes two and three. Keep icMode_h<1> 
asserted during this process to prevent the ICache tag valid 
bits from being cleared by chip reset. Determine which 
row-pairs are defective - one defective pair of rows can be 


replaced in each subarray of the ICache. 

2. Program the soft fuses using ICache mode four. The value 
written into the soft fuses is retained as long as power is 
applied to the chip and icMode_h<2> is asserted. 

3. Test the ICache again using ICache modes six and seven. 

4. Test the rest of the chip using ICache mode five. 


Appendix B of this document describes the soft fuse modes in more 
detail. 
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17 SUMMARY OF IPR CHANGES 


This section summarizes all IPR differences between EV45 and EV4 
pass 3. 


17.1 New Bits In ABOX_CTL 


Bit 


12 


13 


14 


15 


17.2 


Bit 


2:0 


NOCHK_PAR 


DOUBLE_INVAL 


Function 


—_ oe eee eee ee ee ae ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee eee ee eee ee eee ee 


Set to select 16 KB DCache, clear to select 
8 KB DCache. Cleared by reset. 


Set to generate bad primary cache tag parity 


on fills. Cleared by reset. 


Set to disable checking of primary cache parity. 
Cleared by reset. 


When set, dInvReq_h<0> assertions invalidate both 
DCache blocks addressed by iAdr_h<12:5>. 
Cleared by reset. 


DC_STAT Renamed To C_STAT (Same Register Number) 


DC_HIT 


DC_ERR 


IC_ERR 


Function 


Hardwired to 101 (bin) to allow PAL to 
identify EV45. 


Same as existing DC_HIT bit in EV4. 


Set by DCache parity error. Cleared by read 
of C_STAT register. 


Set by ICache parity error. Cleared by read 
of C_STAT register. 


17.3 New Bit In BIU_CTL 


—— ae oe ee ee ee ee oe ee eee ee oe ee 


Bit Name 
37 BYTE_PARITY 
39 IMAP_EN 
44 FAST_LOCK 


18 SUMMARY OF EXTERNAL 


This section summarizes 
EV45 and EV4 pass 3. 
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Function 

If set when BIU_CTL<ECC> is cleared, external 
byte parity is selected. When BIU_CTL<ECC> 
is set this bit is ignored. BYTE_PARITY is 
cleared by reset. 


Set to allow dMapWe_h<1:0> to assert for 
I-stream backup cache reads. Cleared by 
reset. 


When set, FAST_LOCK mode operation is selected. 
This mode can only be used when BIU_CTL<2> is 


also set, indicating that OE-mode BCache RAMs 
are used. Cleared by reset. 


INTERFACE CHANGES 


all external interface differences between 


18.1 New Use For Former Spare Pins 


Pin Type New Name Old Name PGA Location 
O dMapwWe_h<1> spare<0> M24 
I icMode_h<2> spare<1> AD7 
I dInvReq_h<1> spare<3> C24 
I resetSClk_h spare<6> AAI1 
I sysClkDiv_h spare<8> AA16 


The new inputs above wil 


1 have internal pulldowns which will draw a 


maximum current of 200 uA at 2.4V. 


18.2 Renamed Pins 


Pin Type New Name Old Name PGA Location 
O lockWE_h tagEq_1l P24 
I lockFlag_h tagAdr_h<17> R23 


18.3 Pins With New Func 


tions For Other Differences 
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18.3.1 TagOk_h, TagOk_1 - 


EV45 will include an on-chip synchronizer circuit for tagOk_h and 
tagOk_1 which will add a worst case delay of three CPU cycles to the 
path. tagOk_h and tagOk_l1 will both be single-ended inputs 
referenced to VREF. Systems which use tagOk_h should tie tagOk_l to 
VSS. Systems which use tagOK_1 should tie tagOK_h to VDD. Systems 
which do not use the tagOk function should tie tagOk_h to VDD and 
tagOK_1 to VSS. . 


18.3.2 Irq_h<2:0> - 


The value 111 (bin) placed on irq_h<2:0> during reset will select a 
sysClkOut ratio of nine for EV45 vs. eight for EV4. 


18.3.3 CWMask_h<4:3> - 


During READ_BLOCK and LDx/L transactions these pins will contain 
virtual address bit <13>, which should be used as a "set number" 
when allocating backmap entries in 16 KB DCache mode. 


18.3.4 Check_h<27:0> - Some of these pins will be used to carry 
parity in byte parity mode. 


——_ ee ame eae Gee ae eee ee cee ogee ee cee cee ee cee cee oe eee cee See eee ee ee ee eee eee ee eee eee eee ee eee ee eee 


APPENDIX A 


SYSCLKOUT DIVIDER INITIALIZATION SEQUENCE 


resetSClk_h is a test pin used to place EV45’s system clock divider into 
a known state. The sequence begins with resetSClk_h being asserted for 
a Minimum of ten CPU cycles. While resetSClk_h is asserted the system 
clock outputs are deasserted. ResetSClk_h should be deasserted 
synchronously to EV45’s internal CLK signal. ResetSClk_h is sampled by 
EV45 at the rising edge of CLK, and the first rising edge of 
sysClkOut1_h will occur five CLK cycles after the point at which EV45 
samples resetSClk_h’s deassertion. The figure below shows this 
sequence. 


cpuClkout_h / \__/ \__/ \_/ \W_/ \_/ \W_/ Ve 
resetSClk hh ._\ 


sysClkOutl_h ay 
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APPENDIX B 


ICACHE SOFT FUSE MODES 


The ICache consists of two arrays each containing two subarrays. Each 
subarray contains 32 row-pairs, plus one additional redundant row-pair. 
The figure below shows how the ICache partitions the physical address: 


33 13 
: Totaniestanhertententeatentastestentetontentertestestetentententeateate + 
| TAG | 
. larhaatenietenententesietenteatestententetesetetentedenteateteten + 
13°12 11.10 6 5 4 3 
foam p nm per tae nto ta nt 
L fF dod | | of of 
S etests Sesion, deaslent. deshetententenienietentaaetentents seated: dates aentents 
1 | | | | - | 
| It | | | +--+----> bank sel 
| | | | $---------- > row in row-pair 
| | | $= > row-pair index 
[0 Fete a rrr tern > column mux 
hm ere rere tena > array select 


The next figure shows a block diagram of the cache. Each row contains 
four complete cache blocks; address bits <12:11> control the column mux. 
The top subarray is selected when address bit <13> is one, otherwise the 
bottom array is selected. In ICache test modes three and seven, EV45 
serially drives the top array’s inverted contents onto sRomOe_l and the 
bottom array’s inverted contents onto sRomClk_h. 
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R22 re--e------- 4 Cache blocks per row ------------- > 
x Nn ee ee ne ee ee ee en ED te te ee eo 
| | | | | | | | | | | | | | Row-pair 31 
| iLbIiLbI|TLIinoti£éE; dad iti£tlt | oul_LbtI Ltt i 
33 |wilwitiwtltiwteuteiitudlaitwWwtitwiwit{wi 
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V $e $a an ta ta nt nt tt tt 
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| lILILDILoaIiLnbi£géE!s a ttt t | DILobtlbLuitb 
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Row | | | | ls t|ecflwlstgi | | | | | 
Pairs |71!/151!13t21tet4o+4414ef ss [{ 64412 10 4 
| | | | ls | da {si | | | | | 
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V $4 -- 4 -- - $$ a a tt 


The left subarrays contain the odd numbered longwords and associated 
parity for each cache block. The right subarrays contain the even 
numbered longwords and associated parity, the tag, and tag control bits 
of each block. 


The soft fuses form a serial shift register which can be loaded via the 
sRomD_h pin using ICache test mode four (icMode_h<2:0> = HLL). Each 
subarray has 32 fuses for disabling a particular row-pair, and one fuse 
for enabling the redundant row-pair. In addition, there are five fuses 
for programming each redundant row-pair’s decoder. Overall there are 
152 soft fuse bits in the ICache [(32+14+5) * 4]. The order in which 
these bits are loaded via the serial shift chain is shown below. Bits 
shift from top to bottom and left to right: 


Name Function 

DEC_TL <0:4> Redundant row-pair decode, top left 
DEC_TR <4:0> Redundant row-pair decode, top right 
ENR_TR <0O> Enable redundant row-pair, top right 
DIS_TR <31:0> Disable row-pair, top right 

ENR_BR <0Q> Enable redundant row-pair, bottom right 
DIS_BR <31:0> Disable row-pair, bottom right 

DIS_BL <0:31> Disable row-pair, bottom left 

ENR_BL <0O> Enable redundant row-pair, bottom left 
DEC_BR <4:0> Redundant row-pair decode, bottom right 
DEC_BL <0:4> Redundant row-pair decode, bottom left 
DIS_TL <0:31> Disable row-pair, top left 


ENR_TL <0O> Enable redundant row-pair, top left 
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ICACHE SOFT FUSE MODES : Page B-3 


To disable a particular row-pair place a zero in its associated shift 


chain location. To enable a redundant row-pair place a zero in its 
shift chain location. Write true addresses into the redundant decoder 
shift chain locations, eg. to make a redundant row-pair replace 


row-pair zero, place all zeros in its associated shift chain locations. 


The figure below shows the timing for loading the soft fuse latches. 
The bit rate for this sequence is one bit per two CPU cycles. The chip 
tester must assert icMode_h<1l> one CPU cycle before EV45 samples’ the 
last bit of the soft fuse shift chain. EV45 should sample this 
assertion of icMode_h<1> in the same CPU cycle in which it samples’ the 
last soft fuse shift chain bit. 
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